**Springer Series on Touch and Haptic Systems**

# Musical Haptics

# Springer Series on Touch and Haptic Systems

#### Series editors

Manuel Ferre Marc O. Ernst Alan Wing

#### Series Editorial Board

Carlo A. Avizzano José M. Azorín Soledad Ballesteros Massimo Bergamasco Antonio Bicchi Martin Buss Jan van Erp Matthias Harders William S. Harwin Vincent Hayward Juan M. Ibarra Astrid M. L. Kappers Abderrahmane Kheddar Miguel A. Otaduy Angelika Peer Jerome Perret Jean-Louis Thonnard

More information about this series at http://www.springer.com/series/8786

Stefano Papetti • Charalampos Saitis Editors

# Musical Haptics

Editors Stefano Papetti ICST—Institute for Computer Music and Sound Technology Zürcher Hochschule der Künste Zurich Switzerland

Charalampos Saitis Audio Communication Group Technische Universität Berlin Berlin Germany

ISSN 2192-2977 ISSN 2192-2985 (electronic) Springer Series on Touch and Haptic Systems ISBN 978-3-319-58315-0 ISBN 978-3-319-58316-7 (eBook) https://doi.org/10.1007/978-3-319-58316-7

Library of Congress Control Number: 2018935220

© The Editor(s) (if applicable) and The Author(s) 2018. This book is an open access publication. Open Access This book is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this book are included in the book's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the book's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use.

The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, express or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Printed on acid-free paper

This Springer imprint is published by the registered company Springer International Publishing AG part of Springer Nature

The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

To Pietro Cosmo, who was born at the same time as the conception of this book.

Stefano Papetti

To my parents.

Charalampos Saitis

# Series Editors' Foreword

This is the 15th volume of 'Springer Series on Touch and Haptic Systems', which is published as a collaboration between Springer and the EuroHaptics Society.

Musical Haptics explores haptic interaction during the auditory experience of music and the combination of auditory and haptic information during instrumental performance. Auditory and haptic channels receive vibrations during instrument performance. This multimodal interaction is analysed from the points of view of both the audience and the musicians.

Organized into two parts and 13 chapters, the first part is devoted to the fundamentals of haptic interaction and perception of musical cues and part two shows examples in haptic musical interfaces. A glossary of terms at the end that explicitly defines specific terminology is also included.

A successful workshop on Musical Haptics at the EuroHaptics 2016 conference in London led to the writing of this book. The editors have created an excellent compilation of the work introduced during the workshop and added new material to produce a cutting-edge volume. Moreover, this publication is the first open access issue in this Springer series which represents an eagerly anticipated development for our community.

January 2018 Manuel Ferre Marc O. Ernst Alan Wing

## Preface

The two fields of haptics and music are naturally connected in a number of ways. As a matter of fact, sound is nothing more than the auditory manifestation of vibration. When attending a concert, we are reached not only by airborne acoustic waves but also by related vibratory cues conveyed through the air and solid media such as the floor and seats. Moving from the audience to the performance stage, it is thanks to a complex system of auditory–haptic interactions established between musicians and their instruments that the former can render subtle expressive nuances and develop virtuosic playing techniques, and that being at a concert is such a rewarding experience.

Whereas auditory research has since long addressed the musical scenario, research on haptics has only recently started to consider it. This volume aims to fill this gap by collecting for the first time state-of-the-art contributions from distinguished scholars and young researchers working at the intersection of haptics and music performance. It presents theoretical, empirical, and practical aspects of haptic musical interaction and perception, such as the role of haptics in music performance and fruition, and describes the design and evaluation of digital musical interfaces that provide haptic feedback.

The realization of this volume was originally encouraged by Prof. Manuel Ferre, following the successful organization of a scientific workshop on Musical Haptics by Stefano Papetti at the EuroHaptics 2016 conference. The workshop hosted some of the most renowned world experts in the field and fostered discussion, exchange, and collaboration to help address theoretical and empirical challenges in Musical Haptics research. It was, in a way, the crowning event of the project Audio-Haptic modalities in Musical Interfaces<sup>1</sup> (2014–2016), an interdisciplinary research funded by the Swiss National Science Foundation, which initiated an exploratory investigation on the role of haptics and the sense of touch in music practice.

<sup>1</sup> http://p3.snf.ch/project-150107 (last accessed on Nov 27, 2017).

The present volume primarily features contributions from presenters at the EuroHaptics workshop. Additional authors were invited based on their established activities and recent outstanding results. Mirroring the implicitly interdisciplinary nature of Musical Haptics, contributions come from a variety of scientific backgrounds, such as music composition and performance, acoustics, mechanical engineering, robotics, sound and music computing, music perception, and cognitive neuroscience, thus bringing diverse viewpoints on a number of common topics.

Following an introduction which sets out the scope, aims, and relevance of Musical Haptics, the volume comprises 12 contributed chapters divided into two parts. Part I examines the relevance of haptic cues in music performance and perception, discussing how they affect user experience and performance in terms of usability, functionality, and perceived quality of musical instruments. Part II presents engineering, computational, and design approaches and guidelines that have been applied to render and exploit haptic feedback in digital musical interfaces. The two parts are distinct yet complementary: studying the perception of haptics requires sophisticated rendering techniques; developing sophisticated rendering techniques for haptics requires a good understanding of its psychophysics. To help the reader, a glossary is included that gathers in one place explanations of concepts and tools recurring throughout the book.

Musical Haptics is intended for haptic engineers, researchers in human–computer interaction, music psychologists, interaction designers, musical instrument designers, and musicians who, for example, would like to gain insight into the haptic exchange between musicians and their instruments, its relevance for user experience, quality perception and musical performance, as well as practical guidelines for the use of haptic feedback in musical devices and other human– computer interfaces. It is hoped that the present volume will contribute towards a scientific foundation of haptic musical interfaces, even though not all aspects have been possible to take into account.

We thank the Institute for Computer Music and Sound Technology (ICST) at the Zurich University of the Arts (ZHdK) for funding the publication of the present volume in Open Access form, along with the Alexander von Humboldt Foundation for supporting C.S. through a Humboldt Research Fellowship. We are especially grateful to ICST Director Germán Toro-Peréz for his continuous support, as well as to Federico Avanzini and Federico Fontana for their precious organizational advice. Finally, we would like to thank all the authors for their valuable contribution to this book.

Zurich, Switzerland Stefano Papetti Berlin, Germany Charalampos Saitis December 2017

# Contents


James Leonard, Nicolas Castagné, Claude Cadoz and Annie Luciani

xi

xii Contents


# Contributors

M. Ercan Altinsoy Institut für Akustik und Sprachkommunikation, Technische Universität Dresden, Dresden, Germany

Federico Avanzini Dipartimento di Informatica, Università di Milano, Milano, Italy

Stephen David Beck School of Music & CCT—Center for Computation and Technology, Louisiana State University, Baton Rouge, LA, USA

Edgar Berdahl School of Music & CCT—Center for Computation and Technology, Louisiana State University, Baton Rouge, LA, USA

Michael Blandino School of Music & CCT—Center for Computation and Technology, Louisiana State University, Baton Rouge, LA, USA

Anders Bouwer Faculty of Digital Media and Creative Industries, Amsterdam University of Applied Sciences, Amsterdam, The Netherlands

Claude Cadoz ACROE—Association pour la Création et la Recherche sur les Outils d'Expression & Laboratoire ICA—Ingénierie de la Création Artistique, Institut polytechnique de Grenoble, Université Grenoble Alpes, Grenoble, France

Nicolas Castagné Laboratoire ICA—Ingénierie de la Création Artistique, Institut polytechnique de Grenoble, Université Grenoble Alpes, Grenoble, France

Federico Fontana Dipartimento di Scienze Matematiche, Informatiche e Fisiche, Università di Udine, Udine, Italy

Claudia Fritz Équipe LAM—Lutheries-Acoustique-Musique, Institut Jean le Rond d'Alembert UMR 7190, Université Pierre et Marie Curie - CNRS, Paris, France

Martin Fröhlich ICST—Institute for Computer Music and Sound Technology, Zürcher Hochschule der Künste, Zurich, Switzerland

R. Brent Gillespie Mechanical Engineering, University of Michigan, Ann Arbor, MI, USA

Bruno L. Giordano Institut de Neurosciences de la Timone UMR 7289, Aix-Marseille Université-Centre National de la Recherche Scientifique, Marseille, France

Marcello Giordano IDMIL—Input Devices and Music Interaction Laboratory, CIRMMT—Centre for Interdisciplinary Research in Music Media and Technology, McGill University, Montréal, QC, Canada

Vincent Hayward Sorbonne Universités, Université Pierre et Marie Curie, Institut des Systèmes Intelligents et de Robotique, Paris, France

Oliver Hödl Cooperative Systems Research Group, Faculty of Computer Science, University of Vienna, Vienna, Austria

Simon Holland Music Computing Lab, Centre for Research in Computing, The Open University, Milton Keynes, UK

Hanna Järveläinen ICST—Institute for Computer Music and Sound Technology, Zürcher Hochschule der Künste, Zurich, Switzerland

James Leonard Laboratoire ICA—Ingénierie de la Création Artistique, Institut polytechnique de Grenoble, Université Grenoble Alpes, Grenoble, France

Annie Luciani ACROE—Association pour la Création et la Recherche sur les Outils d'Expression & Laboratoire ICA—Ingénierie de la Création Artistique, Institut polytechnique de Grenoble, Université Grenoble Alpes, Grenoble, France

Sebastian Merchel Institut für Akustik und Sprachkommunikation, Technische Universität Dresden, Dresden, Germany

David Murphy University College Cork, Cork, Ireland

Sile O'Modhrain School of Information & School of Music, Theatre and Dance, University of Michigan, Ann Arbor, MI, USA

Stefano Papetti ICST—Institute for Computer Music and Sound Technology, Zürcher Hochschule der Künste, Zurich, Switzerland

Andrew Pfalz School of Music & CCT—Center for Computation and Technology, Louisiana State University, Baton Rouge, LA, USA

Charalampos Saitis Audio Communication Group, Technische Universität Berlin, Berlin, Germany

Sébastien Schiesser ICST—Institute for Computer Music and Sound Technology, Zürcher Hochschule der Künste, Zurich, Switzerland

John Sullivan IDMIL—Input Devices and Music Interaction Laboratory, CIRMMT—Centre for Interdisciplinary Research in Music Media and Technology, McGill University, Montréal, QC, Canada

Marcelo M. Wanderley IDMIL—Input Devices and Music Interaction Laboratory, CIRMMT—Centre for Interdisciplinary Research in Music Media and Technology, McGill University, Montréal, QC, Canada

Jeffrey Weeter University College Cork, Cork, Ireland

Gareth W. Young University College Cork, Cork, Ireland

# **Chapter 1 Musical Haptics: Introduction**

**Stefano Papetti and Charalampos Saitis**

**Abstract** This chapter introduces to the concept of *musical haptics*, its scope, aims, challenges, as well as its relevance and impact for general haptics and human– computer interaction. A brief summary of subsequent chapters is given.

## **1.1 Scope and Goals**

Musical haptics is an emerging interdisciplinary field investigating touch and proprioception in music scenarios from the perspectives of haptic engineering, human– computer interaction (HCI), applied psychology, musical acoustics, aesthetics, and music performance.

The goals of musical haptics research may be summarized as: (i) to understand the role of haptic interaction in music experience and instrumental performance, and (ii) to create new musical devices yielding meaningful haptic feedback.

## **1.2 Haptic Cues in Music Practice and Fruition**

Whenever an acoustic or electroacoustic musical instrument produces sound, that comes from its vibrating components (e.g., the reed and air column in a clarinet, or the strings and soundboard of a piano). While performing on such instruments, the haptic channel is involved in a complex action–perception loop: The player physically interacts with the instrument, on the one hand, to generate sound by injecting energy in

S. Papetti (B)

C. Saitis

ICST—Institute for Computer Music and Sound Technology, Zürcher Hochschule der Künste, Pfingsweidstrasse 96, 8005 Zurich, Switzerland e-mail: stefano.papetti@zhdk.ch

Audio Communication Group, Technische Universität Berlin, Sekretariat E-N 8, Einsteinufer 17c, 10587 Berlin, Germany e-mail: charalampos.saitis@campus.tu-berlin.de

S. Papetti and C. Saitis (eds.), *Musical Haptics*, Springer Series on Touch and Haptic Systems, https://doi.org/10.1007/978-3-319-58316-7\_1

the form of forces, velocities, and displacements (e.g., striking the keys of a keyboard, or bowing, plucking, and pressing the strings of a violin), and on the other hand receiving and perceiving the instrument's physical response (e.g., the instrument's body vibration, the kinematic of keys being depressed, the resistance and vibration of strings). One could therefore assume that the haptic channel supports performance control (e.g., timing, intonation) as well as expressivity (e.g., timbre, emotion). In particular, skilled performers are known to establish a very intimate, rich haptic exchange with their instruments, resulting in truly embodied interaction that is hard to find in other human–machine contexts. Through training-based learning of haptic cues and auditory–tactile interactions, musicians develop highly precise auditory– motor skills [7, 28]. They then form a base of highly demanding users who expect top quality interaction (i.e., extensive control, consistent response, and maximum efficiency) with their instruments–tools that extends beyond mere performance goals to emotional and aesthetical outcomes.

In addition to what described above, both the performers and the audience are reached by vibration conveyed through air and solid media such as the floor and the seats of a concert hall. Those vibratory cues may then contribute to the perception of music (e.g., its perceived quality) and of instrumental performance (e.g., in an ensemble, a player could be able to monitor others' performances also through such cues).

Music fruition and performance therefore present a well-defined framework in which to study basic psychophysical, perceptual, and biomechanical aspects of touch and proprioception, all of which may inform the design of novel haptic musical devices. There is now a growing body of scientific studies of music performance and perception from which to inform research in musical haptics, including topics and methods from the fields of psychophysics [19], biomechanics [11], music education [29], psycholinguistics [32], and artificial intelligence [20].

## **1.3 Musical Devices and Haptic Feedback**

While current digital musical instruments (DMIs) usually offer touch-mediated interaction, they fall short of providing a natural physical experience to the performer.With a few exceptions, they lack haptic cues other than those intrinsically provided by their (passive) mechanics, if any (e.g., the kinematics of a digital piano keyboard)—in other words, their behavior is the same whether they are turned on or off. Such missing link between sound production and active haptic feedback, summed to the fact that even sophisticated sound synthesis cannot (yet?) compete with the complexity and liveliness of acoustically generated sound, generally makes the experience of performing on DMIs less rewarding and rich than playing traditional instruments. Try asking a professional pianist, especially a classically trained one, to play a digital piano and watch out! However, one could argue that establishing a rich haptic exchange between musicians and their digital tools would enhance performance control, expressivity, and user experience, while the music listening experience would be improved by conveying audio-related vibratory cues to the listener. Indeed, a recently renewed interest in advancing haptic interaction design for everyday intelligent interfaces shared across the HCI and engineering communities, as well as the consumer electronics industry—promotes the idea that haptics has the potential to greatly improve usability, engagement, learnability, and the overall experience of the user, moreover with minimal or no requirements for constant visual attention [15, 17]. For example, haptic feedback is already used to improve robotic control in surgical teleoperation [27] and to increase realism and immersion in virtual reality applications [30].

With regard to applications, haptic musical interfaces may provide feedback on the performance itself or on various musical processes (e.g., representing a score). In addition to enhancing performance control and expressivity, they have a high potential as tools for music tuition, for providing guidance in (intrinsically noisy) large ensembles and remote performance scenarios, and for facilitating access to music practice and fruition for persons affected by somatosensory, visual, and even hearing impairments [6, 13, 21]. A notable example is: The virtuoso and profoundly deaf percussionist Evelyn Glennie explained her use of vibrotactile cues in musical performance, to the point of recognizing the pitch, based on where the vibrations are felt on her body [10]. A further potential application of programmable haptic feedback in musical interfaces is to offer a way of prototyping the mechanical response of components found in traditional instruments (e.g., the kinematics and vibratory behavior of a piano keyboard), thus saving time and lowering production costs, as opposed to traditional hardware development.

Some efforts were made in recent years to define a systematic approach for the design of haptic DMIs and to assess their utility [3, 9, 23]. Some of the developed prototypes simulate the haptic behavior of existing acoustic or electroacoustic instruments, while others implement new paradigms not necessarily linked to traditional instruments. Early examples of haptic musical interfaces consist in piano-like keyboards with computer-driven mechanical feedback for simulating touch responses of various keyboard instruments (e.g., harpsichord, organ, piano) [4, 8]. More recently, a haptic system using magneto-rheological technology was developed that could reproduce the dynamic behavior of piano keyboards [16]. A vibrotactile feedback system for open-air music controllers, based on an actuated ring or a feet stimulator, was proposed in [31]. Haptic DMIs inspired by traditional instruments (violin, woodwinds, monochord, and slide whistle) are described in [2, 18, 22]. In [26], actuators were used on acoustic and electroacoustic instruments to feed mechanical energy back and induce or dampen resonances.

Only a few commercial examples of haptic musical devices are currently found. The Yamaha AvantGrand<sup>1</sup> series of digital pianos embed vibration transducers simulating the effect of vibrating strings and soundboard, and pedal depression. The system can be turned on or off, and vibration intensity adjusted. The Ultrasonic Audio Syntact<sup>2</sup> is a midair musical interface that performs hand-gesture analysis by means of a camera, and provides tactile feedback at the hand through an array of

<sup>1</sup>https://europe.yamaha.com/en/products/musical\_instruments/pianos/avantgrand/ (last accessed on Dec 7, 2017).

<sup>2</sup>http://www.ultrasonic-audio.com/products/syntact.html (last accessed on Dec 7, 2017).

ultrasonic transducers. The Soundbrenner Pulse3 is a wearable vibrotactile metronome. The Loflet Basslet<sup>4</sup> and Subpac5 are wearable low-frequency vibration transducers (tactile subwoofers), respectively, in the form of a bracelet and a vest, whose goal is to enhance the music listening experience.

## **1.4 Challenges**

Research in musical haptics faces several challenges, some of which are common to haptic engineering and HCI in general.

From a technology viewpoint, the use of sensors and actuators can be especially problematic because haptic musical interfaces should generally be compact and unobtrusive (to allow for seamless interaction), efficient in terms of power (so they can be compatible with current consumer electronics industrial processes), and offer high fidelity/accuracy (to enable sensing subtle gestures and rendering complex haptic cues). Musical haptics would then gain from further developments in sensing and actuator technology in those directions.

From the perspective of HCI and psychophysics, the details of how the haptic modality is actually involved and exploited while performing with traditional musical instruments or while listening to music are still largely unknown. More psychophysical evidence and behavioral evidence are needed to establish the biomechanics of touch and how haptic cues affect measurable performance parameters such as accuracy in timing, intonation, and dynamics, as well as to better understand the role of vibration in idiosyncratic perceptions of sound/instrument quality by performers and music/sound aesthetics by listeners.

What is more, haptic musical interfaces are interactive systems that require rigorous user experience evaluation to help define optimal configurations between perceptual effects and limitations on the one hand, and technological solutions on the other [5, 12, 33]. Despite the fact that several evaluation frameworks have been proposed [14, 24, 34], the evaluation of digital musical devices and related user experience currently suffers from a lack of commonly accepted goals, criteria, and methods [1, 25].

## **1.5 Outline**

The first part of the book presents theoretical and empirical work in musical haptics with particular emphasis on biomechanical, psychophysical, and behavioral aspects of music performance and music perception. Chapter 2 redefines, with an original perspective, the biomechanics of the musician–instrument interaction as a tight

<sup>3</sup>http://www.soundbrenner.com (last accessed on Dec 23, 2017).

<sup>4</sup>https://lofelt.com/ (last accessed on Dec 7, 2017).

<sup>5</sup>http://subpac.com/ (last accessed on Dec 23, 2017).

dynamic coupling, rather than the mere interaction of two separate entities. Chapter 3 introduces basic concepts and functions related to the anatomy and physiology of the human somatosensory system with special focus on the perception of touch, pressure, vibration, and movement. Chapter 4 reports experiments investigating vibrotactile perception in finger-pressing tasks and while performing on the piano. Chapter 5 examines the role of vibrotactile cues on the perception of sound/instrument quality from the perspective of the musician, based on recent psycholinguistic and psychophysical evidence from violin and piano studies. Chapter 6 reports an experiment that uses quantitative and qualitative HCI evaluation methods to assess how various types of haptic feedback on a DMI affect aspects of functionality, usability, and user experience. Chapter 7 considers a music listening scenario for different musical genres and tests how body vibrations—generated from the original audio signal using a variety of approaches—influence the musical experience of the listener.

The second part of the volume presents design examples, applications, and evaluations of haptic musical interfaces. Chapter 8 describes an advanced hardware– software system for real-time rendering of physically modeled virtual instruments that can be played with force feedback, and its use as a creative artistic tool. Chapter 9 examines hardware and computing solutions for the development of haptic forcefeedback DMIs through a case study of music compositions for the Laptop Orchestra of Louisiana. Chapter 10 proposes and evaluates the design of a taxonomy of vibrotactile cues and a stimulation system consisting in wearable garments for providing information similar to a score during music performance. Chapter 11 reports a series of experiments investigating the design and evaluation of vibrotactile stimulation for learning rhythm skills of varying complexity, with a special emphasis on multilimb coordination. Chapter 12 evaluates the use of touchscreen interfaces augmented with audio-driven vibrotactile cues in music production, focusing on performance, user experience, and the cross-modal effect of audio loudness on tactile intensity. Chapter 13 illustrates common vibrotactile actuators technology and provides three examples of audio-haptic interfaces iteratively designed through validation procedures that tested their accuracy in measuring user gesture and in delivering vibrotactile cues.

A glossary at the end of the book provides descriptions (including related abbreviations) of concepts and tools that are frequently mentioned throughout the volume, offering a useful background for those less acquainted with haptic and music technology.

## **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# Part I Musical Haptics: Interaction and Perception

# **Chapter 2 Once More, with Feeling: Revisiting the Role of Touch in Performer-Instrument Interaction**

**Sile O'Modhrain and R. Brent Gillespie**

**Abstract** The dynamical response of a musical instrument plays a vital role in determining its playability. This is because, for instruments where there is a physical coupling between the sound-producing mechanism of the instrument and the player's body (as with any acoustic instrument), energy can be exchanged across points of contact. Most instruments are strong enough to push back; they are springy, have inertia, and store and release energy on a scale that is appropriate and well matched to the player's body. Haptic receptors embedded in skin, muscles, and joints are stimulated to relay force and motion signals to the player. We propose that the performer-instrument interaction is, in practice, a dynamic coupling between a mechanical system and a biomechanical instrumentalist. We take a stand on what is actually under the control of the musician, claiming it is not the instrument that is played, but the dynamic system formed by the instrument coupled to the musician's body. In this chapter, we suggest that the robustness, immediacy, and potential for virtuosity associated with acoustic instrument performance are derived, in no small measure, from the fact that such interactions engage both the active and passive elements of the sensorimotor system and from the musician's ability to learn to control and manage the dynamics of this coupled system. This, we suggest, is very different from an interaction with an instrument whose interface only supports information exchange. Finally, we suggest that a musical instrument interface that incorporates dynamic coupling likely supports the development of higher levels of skill and musical expressiveness.

R. B. Gillespie Mechanical Engineering, University of Michigan, 3450 GG Brown Building, 2350 Hayward Street, MI 48109-2525 Ann Arbor, MI, USA e-mail: brentg@umich.edu

© The Author(s) 2018

S. O'Modhrain (B)

School of Information & School of Music, Theatre and Dance, University of Michigan, 2051 Moore Building, 1100 Baits Dr, MI 48109-2085 Ann Arbor, MI, USA e-mail: sileo@umich.edu

S. Papetti and C. Saitis (eds.), *Musical Haptics*, Springer Series on Touch and Haptic Systems, https://doi.org/10.1007/978-3-319-58316-7\_2

## **2.1 Introduction**

The mechanics of a musical instrument's interface—what the instrument feels like—determines a great deal of its playability. What the instrument provides to be held, manipulated by mouth or hand, or otherwise controlled has obvious but also many subtle implications for how it can be used for musical expression. One means to undertake an analysis of playability and interface mechanics is in terms of the mechanical energy that is exchanged between a player's body and the instrument. For acoustic instruments, mechanical energy injected by the player is transformed into acoustic energy through a process of resonance excitation. For electronic instruments, electrical energy is generally transformed into acoustic energy through a speaker, but controlled by interactions involving the player's body and some physical portion of the instrument.

Importantly, there exists the possibility for mechanical energy stored in the physical part of the instrument to be returned to the player's body. This possibility exists for both acoustic and electronic instruments, though in acoustic instruments it is in fact a likelihood. This likelihood exists because most acoustic instruments are strong enough to push back; they are springy, have inertia, and store and return energy on a scale that is roughly matched to the scale at which the player's body stores and returns energy. Given that energy storage and return in the player's body is determined by passive elements in muscle and tissues, one can say that the scale at which interface elements of the instrument are springy and have mass is similar to the scale at which muscles and tissues of the player are springy and have mass. That is, the mechanics of most acoustic instruments are roughly impedance matched to the biomechanics of the player's body. Impedance matching facilitates the exchange of energy between passive elements within the instrument and passive elements that are part of the biomechanics of the player. Thus the player's joints are moved or backdriven by the instrument, muscle stiffness is loaded, and the inertial dynamics of body segments are excited. In turn, haptic receptors embedded in skin, muscles, and joints are stimulated and relay force and motion signals to the player. It is also no accident that the parts of the body that interact with instruments—lips, fingers, hands—are the most highly populated by haptic receptors.

In this chapter, we propose that performer-instrument interaction is a dynamic coupling between a mechanical system and a biomechanical instrumentalist. This repositions the challenge of playing an instrument as a challenge of "playing" the coupled dynamics in which the body is already involved. We propose that interactions in which both the active and passive elements of the sensorimotor system (see Chap. 3) are engaged form a backdrop for musical creativity that is much more richly featured than the set of actions one might impose on an instrument considered in isolation from the player's body. We further wish to propose that the robustness, immediacy, and potential for virtuosity associated with acoustic instrument performance are derived, in no small measure, from the fact that such interactions engage both the active and passive elements of the sensorimotor system and determine the musician's ability to learn and manage the dynamics of this coupled system. This, we suggest, is very different from an interaction with an electronic instrument whose interface is only designed to support information exchange.

We also suggest that a musical instrument interface that incorporates dynamic coupling supports the development of higher levels of skill and musical expressiveness. To elaborate these proposals concretely, we will adopt a modeling approach that explicitly considers the role of the musician's body in the process of extracting behaviors from a musical instrument. We will describe the springiness, inertia, and damping in both the body and the instrument in an attempt to capture how an instrument becomes an extension of the instrumentalist's body. And insofar that the body might be considered an integral part of the process of cognition, so too does an instrument become a part of the process of finding solutions to musical problems and producing expressions to musical ideas.

## **2.2 A Musician Both Drives and Is Driven by Their Instrument**

The standard perspective on the mechanics of acoustic instruments holds that energy is transformed from the mechanical to the acoustic domain—mechanical energy passes from player to instrument and is transformed by the instrument, at least in part, to acoustic energy that emanates from the instrument into the air. Models that describe the process by which mechanical excitation produces an acoustic response have been invaluable for instrument design and manufacture and have played a central role in the development of sound synthesis techniques, including modal synthesis [1] and especially waveguide synthesis [2] and physical modeling synthesis algorithms [3–5]. The role of the player in such descriptions is to provide the excitation or to inject energy. Using this energy-based model, the question of "control," or how the player extracts certain behaviors including acoustic responses from the instrument reduces to considering how the player modulates the amount and timing of energy injected.

While an energy-based model provides a good starting point, we argue here that a musician does more than modulate the amount and timing of excitation. Elaborating further on the process of converting mechanical into acoustic energy, we might consider that not all energy injected is converted into acoustic energy. A portion of the energy is dissipated in the process of conversion or in the mechanical action of the instrument and a portion might be reflected back to the player. As an example, in Fig. 2.1, we show that a portion of the energy injected into the piano action by the player at the key is converted to sound, another portion is dissipated, and yet another portion is returned back to the player at the mechanical contact.

But a model that involves an injection of mechanical energy by the player does not imply that all energy passes continuously in one direction, nor even that the energy passing between player and instrument is under instantaneous control of the player. There might also exist energy exchanges between the player's body and

**Fig. 2.1** In response to energy injected at the key, the piano action reflects a portion, dissipates a portion, and converts another portion into output sound

the instrument whose time course is instead governed by the coupling of mechanical energy storage elements in the player's body and the instrument. Conceivably, energy may even oscillate back and forth between the player and instrument, as governed by the coupled dynamics. For example, multiple strikes of a drumstick on a snare drum are easily achieved with minimal and discrete muscle actions because potential energy may be stored and returned in not only the drumhead but also in the finger grip of the drummer. To drive these bounce oscillations, the drummer applies a sequence of discrete muscle actions at a much slower rate than the rate at which the drumstick bounces. Then to control the bounce oscillation rate, players modulate the stiffness of the joints in their hand and arm [6].

We see, then, that energy exchanges across a mechanical contact between musician and instrument yield new insights into the manner in which a player extracts behavior from an acoustic instrument. Cadoz and Wanderly, in defining the functions of musical gesture, refer to this exchange of mechanical energy as the "ergotic" function, the function which requires the player to do work upon the instrument mechanism [7]. Chapter 8 describes a software–hardware platform that addresses such issue. We extend this description here to emphasize that the instrument is a system which, once excited, will also "do work" on the biomechanical system that is the body of the player. In particular, we shall identify passive elements in the biomechanics of the player's body upon which the instrument can "do work" or within which energy returned from the instrument can be stored in the player's body, without volitional neural control by the player's brain. The drumming example elaborated above already gives a flavor for this analysis. It is now important to consider the *bio*mechanics of the player's body.

Note that relative to virtually all acoustic musical instruments, the human body has a certain *give*, or *bends under load.* Such bending under load occurs even when the body is engaged in manually controlling an instrument. In engineering terms, the human body is said to be *backdrivable*. And this backdrivability is part of the match in mechanical impedance between body and instrument. Simple observations support this claim, such as excursions that take place at the hand without volitional control if the load from an instrument is unexpectedly applied or removed. Think for example of the sudden slip of the bowing hand when the bowstring interaction fails because of a lack of rosin [8]. It follows that significant power is exchanged between the player and instrument, even when the player is passive. Such power exchanges cannot be captured by representing the player as a motion source (an agent capable of specifying a motion trajectory without regard to the force required) or a force source (an agent capable of specifying a force trajectory without regard to the motion required). Because so much of the passive mechanics of the player's body is involved, the contact between a human and machine turns out to hold disadvantages when it comes to dividing the human/machine system into manageable parts for the purposes of modeling.

If good playability was to be equated with high control authority and the backdrivable biomechanics ignored, then an instrument designer might maximize instrument admittance while representing the player as a motion source or maximize instrument impedance while representing the player as a force source. Indeed, this approach to instrument design has, on the one hand, produced the gestural control interface that provides no force feedback and, on the other hand, produced the touch screen that provides no motion feedback. But here we reject representations of the player as motion or force source and label approaches which equate playability with high control authority as misdirected. We contend that the gestural control interface lacking force feedback and touch screen are failures of musical instrument interface design (Chap. 12 discusses the use of touch screen devices with tactile feedback for patternbased music composition and mixing). We claim that increasing a player's control authority does not amount to increasing the ability of the player to express their motor intent. Instead, the impedance of the instrument should be matched to that of the player, to maximize power transfer between player and machine and thereby increase the ability of the player to express their motor (or musical expression) intent. Our focus on motor intent and impedance rather than control authority amounts to a fundamental change for the field of human motor control and has significant implications for the practice of designing musical instruments and other machines intended for human use.

## **2.3 The Coupled Dynamics: A New Perspective on Control**

In this chapter, we are particularly interested in answering how a musician controls an instrument. To competently describe this process, our model must capture two energy-handling processes in addition to the process by which mechanical energy is converted into acoustic energy: First, how energy is handled by the instrument interface, and second, how it is handled by the player's body. Thereafter, we will combine these models to arrive at a complete system model in which not only energy exchanges, but also information exchanges can be analyzed, and questions of playability and control can be addressed.

For certain instruments, the interface mechanics have already been modeled to describe what the instrument feels like to the player. Examples include models that capture the touch response of the piano action [9, 10] and feel of the drum head [11].

To capture the biomechanics of the player, suitable models are available from many sources, though an appropriately reduced model may be a challenge to find. In part, we seek a model describing what the player's body "feels like" to the instrument, the complement of a model that describes what the instrument feels like to the player. We aim to describe the mechanical response of the player's body to mechanical excitation at the contact with the instrument. Models that are competent without being overly complex may be determined by empirical means, or by system identification. Hajian and Howe [12] determined the response of the fingertip to a pulse force and Hasser and Cutkosky determined the response of a thumb/forefinger pinch grip to a pulse torque delivered through a knob [13]. Both of these works proposed parametric models in place of non-parametric models, showing that simple second-order models with mass, stiffness, and damping elements fit the data quite well. More detailed models are certainly available from the field of biomechanics, where characterizations of the driving point impedance of various joints in the body can be helpful for determining state of health. Models that can claim an anatomical or physiological basis are desirable, but such models run the risk of contributing complexity that would complicate the treatment of questions of control and playability.

Models that describe what the instrument and body feel like to each other are both models of driving-point impedance. They each describe relationships between force and velocity at the point of contact between player and instrument. The drivingpoint impedance of the instrument expresses the force response of the instrument to a velocity imposed by the player, and the driving-point impedance of the player expresses the force response of the player to a velocity imposed by the instrument. Of course, only one member of the pair can impose a force at the contact. The other subsystem must respond with velocity to the force imposed at the contact; thus, its model must be expressed as a driving-point *admittance*. This restriction as to which variable may be designated an input and which an output is called a *causality restriction* (see, e.g., [14]). The designation is an essentially arbitrary choice that must be made by the analyst. Let us choose to model the player as an admittance (imposing velocity at the contact) and the instrument as an impedance (imposing force at the contact).

Driving-point impedance models that describe what the body or instrument feel like to each other provide most, but not all of what is needed to describe how a player controls an instrument. A link to muscle action in the player and a link to the process by which mechanical energy is converted into acoustic energy in the instrument are still required. In particular, our driving-point admittance model of the player must be elaborated with input/output models that account for the processing of neural and mechanical signals in muscle. In addition, our driving-point impedance model of the instrument must be elaborated with an input/output model that accounts for the excitation of a sound generation process. If our driving-point admittance and impedance models are lumped parameter models in terms of mechanical mass, spring, and damping elements, then we might expect the same parameters to appear in the

**Fig. 2.2** Musician and instrument may both be represented as multi-input, multi-output systems. Representing the instrument in this way, an operator G transforms mechanical excitation into mechanical response. An operator P transforms mechanical excitation into acoustic response. Representing the player, let H indicate the biomechanics of the player's body that determines the mechanical response to a mechanical excitation. The motor output of the player also includes a process M, in which neural signals are converted into mechanical action. The response of muscle M to neural excitation combines with the response of H to excitation from the instrument to produce the action of the musician on the instrument. The brain produces neural activation of muscle by monitoring both haptic and acoustic sensation. Blue arrows indicate neural signaling and neural processing while red arrows indicate mechanical signals and green arrows indicate acoustic signals

input/output models that we use to capture the effect of muscle action and the process of converting mechanical into acoustic energy.

Let us represent the process inside the instrument that transforms mechanical input into mechanical response as an operator *G* (see Fig. 2.2). This is the driving-point impedance of the instrument. And let the process that transforms mechanical input into acoustic response be called *P*. Naturally, in an acoustic instrument both *G* and *P* are realized in mechanical components. In a digital musical instrument, *P* is often realized in software as an algorithm. In a motorized musical instrument, even *G* can be realized in part through software [15].

As described above, in *P,* there is generally a change in the frequency range that describes the input and output signals. The input signal, or excitation, occupies a low-frequency range, usually compatible with human motor action. The relatively high-frequency range of the output is determined in an acoustic instrument by a resonating instrument body or air column that is driven by the actions of the player on the instrument. Basically, motor actions of the player are converted into acoustic frequencies in the process P. On the other hand, *G* does not usually involve a change in frequency range.

Boldly, we represent the musician as well, naming the processes (operators) that transform input to output inside the nervous system and body of the musician. Here we identify both neural and mechanical signals, and we identify processes that transform neural signals, processes that transform mechanical signals (called biomechanics) and transducers that convert mechanical into neural signals (mechanoreceptors and proprioceptors) and transducers that convert neural into mechanical signals (muscles). Sect. 3.3.1 provides a description of such mechanisms. Let us denote those

**Fig. 2.3** Instrument playing considered as a control design problem. **a** The musician, from the position of controller in a feedback loop, imposes their control actions on the instrument while monitoring the acoustic and haptic response of the instrument. **b** From the perspective of dynamic coupling, the "plant" upon which the musician imposes control actions is the system formed by the instrument and the musician's own body (biomechanics)

parts of the musician's body that are passive or have only to do with biomechanics in the operator *H*. Biomechanics encompasses stiffness and damping in muscles and mass in bones and flesh. That is, biomechanics includes the capacity to store and return mechanical energy in either potential (stiffness) or kinetic (inertial) forms and to dissipate energy in damping elements. Naturally, there are other features in the human body that produce a mechanical response to a mechanical input that involve transducers (sensory organs and muscles) including reflex loops and sensorimotor loops. Sensorimotor loops generally engage the central nervous system and often some kind of cognitive or motor processing. These we have highlighted in Fig. 2.2 as a neural input into the brain and as a motor command that the brain produces in response. We also show the brain as the basis for responding to an acoustic input with a neural command to muscle. Finally, we represent muscle as the operator M that converts neural excitation into a motor action. The ears transform acoustic energy into neural signals available for processing and the brain in turn generates muscle commands that incite the action of the musician on the instrument. Figure 2.3 also represents the action of the musician on the instrument as the combination of muscle actions through M and response to backdrive by the instrument through H. Note that the model in Fig. 2.3 makes certain assumptions about superposition, though not all operators need be linear.

This complete model brings us into position to discuss questions in control, that is, how a musician extracts desired behaviors from an instrument. We are particularly interested in how the musician formulates a control action that elicits a desired behavior or musical response from an instrument. We will attempt to unravel the processes in the formulation of a control action, including processes that depend on immediately available sensory input (feedback control) and processes that rely on memory and learning (open-loop control).

As will already be apparent, the acoustic response of an instrument is not the only signal available to the player as feedback. In addition, the haptic response functions as feedback, carrying valuable information about the behavior of the instrument and complementing the acoustic feedback. Naturally, the player, as controller in a feedback loop, can modify his or her actions on the instrument based on a comparison of the desired sound and the actual sound coming from the instrument. But the player can also modify his or her actions based on a comparison of the feel of the instrument and a desired or expected feel. A music teacher quite often describes a desired feel from the instrument, encouraging a pupil to adjust actions on the instrument until such a mechanical response can be recognized in the haptic response. One of the premises of this volume is that this second, haptic, channel plays a vital role in determining the "playability" of an instrument, i.e., in providing a means for the player to "feel" how the instrument behaves in response to their actions.

In the traditional formulation, the instrument is the system under control or the "plant" in the feedback control system (see Fig. 2.3a). As controller, the player aims to extract a certain behavior from the instrument by imposing actions and monitoring responses. But given that the haptic response impedes on the player across the same mechanical contact as the control action imposed by the player, an inner feedback loop is closed involving only mechanical variables. Neural signals and the brain of the instrument player are not involved. The mechanical contact and the associated inner feedback loop involve the two variables force and velocity whose product is power and is the basis for energy exchanges between player and instrument. That is, the force and motion variables that we identify at the mechanical contact between musician and instrument are special in that they transmit not only information but also mechanical energy. That energy may be expressed as the derivative of power, the product of force and velocity at the mechanical contact. As our model developed above highlights, a new dynamical system arises when the body's biomechanics are coupled to the instrument mechanics. We shall call this new dynamical system the *coupled dynamics*. The inner feedback loop, which is synonymous with the coupled dynamics, is the new "plant" under control (see Fig. 2.3b). The outer feedback loop involves neural control and still has access to feedback in both haptic and audio channels.

In considering the "control problem," we see that the coupled dynamics is a different system, possibly more complex, than the instrument by itself. Paradoxically, the musician's brain is faced with a greater challenge when controlling the coupled dynamical system that includes the combined body and instrument dynamics. There are new degrees of freedom (DoF) to be managed—dynamic modes that involve exchanges of potential and kinetic energy between body and instrument. But something unique takes place when the body and instrument dynamics are coupled. A feedback loop is closed and the instrument becomes an extension of the body. The instrument interface disappears and the player gains a new means to effect change in their environment. This sense of immediacy is certainly at play when a skilled musician performs on an acoustic instrument.

But musical instruments are not generally designed by engineers. Rather, they are designed by craftsmen and musicians—and usually by way of many iterations of artistry and skill. Oftentimes that skill is handed down through generations in a process of apprenticeship that lacks engineering analysis altogether. Modern devices, on the other hand—those designed by engineers—might function as extensions of the brain, but not so much as extensions of the body. While there is no rule that says a device containing a microprocessor could not present a vanishingly small or astronomically large mechanical impedance to its player, it can be said that digital instrument designers to date have been largely unaware of the alternatives. Is it possible to design a digital instrument whose operation profits from power exchanges with its human player? We aim to capture the success of devices designed through craftsmanship and apprenticeship in models and analyses and thereby inform the design of new instruments that feature digital processing and perhaps embedded control.

## **2.4 Inner and Outer Loops in the Interaction Between Player and Instrument**

Our new perspective, in which the "plant" under control by the musician is the dynamical system determined conjointly by the biomechanics of the musician and the mechanics of the instrument, yields a new perspective on the process of controlling and learning to control an instrument. Consider for a moment, the superior access that the musician has to feedback from the dynamics of the coupled system relative to feedback from the instrument. The body is endowed with haptic sensors in the lips and fingertips, but also richly endowed with haptic and proprioceptive sensors in the muscles, skin, and joints. Motions of the body that are determined in part by muscle action but also in part by actions of the instrument on the body may easily be sensed. A comparison between such sensed signals and expected sensations, based on known commands to the muscles, provides the capability of estimating states internal to the instrument. See, for example, [16].

The haptic feedback thus available carries valuable information for the musician about the state of the instrument. The response might even suggest alternative actions or modes of interaction to the musician. For example, the feel of let-off in the piano action (after which the hammer is released) and the feel of the subsequent return of the hammer onto the repetition lever and key suggest the availability of a rapid repetition to the pianist.

Let us consider cases in which the coupled dynamics provides the means to achieve oscillatory behaviors with characteristic frequencies that are outside the range of human volitional control. Every mechanical contact closes a feedback loop, and closing a feedback loop between two systems capable of storing and returning energy creates a new dynamic behavior. Speaking mechanically, if the new mode is underdamped, it would be called a new resonance or vibration mode. On the one hand, the force and motion variables support the exchange of mechanical energy; on the other hand, they create a feedback loop that is characterized by a resonance. Since we have identified a mechanical subsystem in both the musician and the instrument, it is noteworthy that these dynamics are potentially quite fast. There is no neural transmission nor cognitive processing that takes place in this pure mechanical loop.

Given that neural conduction velocities and the speed of cognitive processes may be quite slow compared to the rates at which potential and kinetic energy can be exchanged between two interconnected mechanical elements, certain behaviors in the musician-/instrument-coupled dynamics can be attributed to an inner loop, not involving closed-loop control by the musician's nervous system. In particular, neural conduction delays and cognitive processing times on the order of 100 ms would preclude stable control of a lightly underdamped oscillator at more than about 5 Hz [17], yet rapid piano trills exceeding 10 Hz are often used in music [18]. The existence of compliance in the muscles of the finger and the rebound of the piano key are evidently involved in an inner loop, while muscle activation is likely the output of a feedforward control process.

As we say, the musician is not playing the musical instrument but instead playing the coupled dynamics of his or her own body and instrument. Many instruments support musical techniques which are quite evidently examples of the musician driving oscillations that arise from the coupled dynamics of body and instrument mechanics. For example, the *spiccato* technique in which a bow is "bounced" on a string involves driving oscillatory dynamics that arise from the exchange of kinetic and potential energy in the dynamics of the hand, the bow and hairs, and the strings. Similarly, the exchange of kinetic and potential energy underlies the existence of oscillatory dynamics in a drum roll, as described above. It is not necessary for the drummer to produce muscle action at the frequency of these oscillations, only to synchronize driving action to these oscillations [6].

The interesting question to be considered next is whether the perspective we have introduced here may have implications for the design of digital musical instruments: whether design principles might emerge that make a musical instrument an extension of the human body and a means for the musician to express their musical ideas. It is possible that answering such a question might also be the key to codifying certain emerging theories in the fields of human motor control and cognitive science. While it has long been appreciated that the best machine interface is one that "disappears" from consciousness, a theory to explain such phenomena has so far been lacking.

The concept of dynamic coupling introduced here also suggests a means for a musician to learn to control an instrument. First, we observe that humans are very adept at controlling their bodies when not coupled to objects in the environment. Given that the new control challenge presented when the body is coupled to an instrument in part involves dynamics that were already learned, it can be said that the musician already has some experience even before picking up an instrument for the first time. Also, to borrow a term from robotics, the body is hyper-redundantly actuated and equipped with a multitude of sensors. From such a perspective, it makes sense to let the body be backdriven by the instrument, because only then do the redundant joints become engaged in controlling the instrument.

An ideal musical instrument is a machine that extends the human body. From this perspective, it is the features in a musical instrument's control interface that determine whether the instrument can express the player's motor intent and support the development of manual skill. We propose that approaching questions of digital instrument design can be addressed by carefully considering the coupling between a neural system, biomechanical system, and instrument, and even the environment in which the musical performance involving the instrument takes place. Questions can be informed by thinking carefully about a neural system that "knows" how to harness the mechanics of the body and object dynamics and a physical system that can "compute in hardware" in service of a solution to a motor problem.

The human perceptual system is aligned not only to extracting structure from signals (or even pairs of signals) but to extract structure from pairs of signals known to be excitations and responses (inputs and outputs). What the perceptual system extracts in that case is what the psychologist J. J. Gibson refers to as "invariants" [19]. According to Gibson, our perceptual system is oriented not to the sensory field (which he terms the "ambient array") but to the structure in the sensory field, the set of signals which are relevant in the pursuit of a specific goal. For example, in catching a ball, the "signal" of relevance is the size of the looming image on the retina and indeed the shape of that image; together these encode both the speed and angle of the approaching ball. Similarly, in controlling a drum roll, the signal of relevance is the rebound from the drumhead which must be sustained at a particular level to ensure an even roll. The important thing to note is that for the skilled player, there is no awareness of the proximal or bodily sensation of the signal. Instead, the external or "distal" object is taken to be the signal's source. In classical control, such a structured signal is represented by its generator or a representation of a system known to generate such a structured signal.

Consider for a moment, a musician who experiences a rapid oscillation-like behavior arising from the coupling of his or her own body and an instrument, perhaps the bounce of a bow on a string, or the availability of a rapid re-strike on a piano key due to the function of the repetition lever. Such an experience can generally be evoked again and again by the musician learning to harness such a behavior and develop it into a reliable technique, even if it is not quite reliable at first. The process of evoking the behavior, by timing one's muscle actions, would almost certainly have something to do with driving the behavior, even while the behavior's dynamics might involve rapid communication of energy between body and instrument as described above. Given that the behavior is invariant to the mechanical properties of body and instrument (insofar that those properties are constant) it seems quite plausible that the musician would develop a kind of internal description or internal model of the dynamics of the behavior. That internal model will likely also include the possibilities for driving the behavior and the associated sensitivities.

In his pioneering work on human motor control, Nicolai Bernstein has described how the actions of a blacksmith are planned and executed in combination with knowledge of the dynamics of the hammer, workpiece, and anvil [20]. People who are highly skilled at wielding tools are able to decouple certain components of planned movements, thereby making available multiple "loops" or levels of control which they can "tighten" or "loosen" at will. In the drumming example cited above, we have seen that players can similarly control the impedance of their hand and arm to control the height of stick bounces (the speed of the drum roll), while independently controlling the overall movement amplitude (the loudness of the drum roll).

Interestingly, the concept of an internal model has become very influential in the field of human motor behavior in recent years [21] and model-based control has become an important sub-discipline in control theory. There is therefore much potential for research concerned with exploring the utility of model-based control for musical instruments, especially from the perspective that the model internalized by the musician is one that describes the mechanical interactions between his or her own body and the musical instrument. This chapter is but a first step in this direction. Before leaving the questions we have raised here, however, we will briefly turn our attention to how the musician might learn to manage such coupled dynamics, proposing that the robustness, immediacy, and potential for virtuosity associated with acoustic instrument performance is derived in large part from engaging interactions that involve both the active and passive elements of the sensorimotor system.

## **2.5 Implications of a Coupled Dynamics Perspective on Learning to Play an Instrument**

At the outset of this chapter, we proposed that successful acoustic instruments are those which are well matched, in terms of their mechanical impedance, to the capabilities of our bodies. In other words, for an experienced musician, the amount of work they need to do to produce a desired sound is within a range that will not exhaust their muscles on the one hand but which will provide sufficient push-back to support control on the other. But what about the case for someone learning an instrument? What role does the dynamic behavior of the instrument play in the process of learning? Even if we do not play an instrument ourselves, we are probably all familiar with the torturous sound of someone learning to bow a violin, or with our own exhausting attempts to get a note out of a garden hose. This is what it sounds and feels like to struggle with the coupled dynamics of our bodies and an instrument whose dynamical behavior we have not yet mastered. And yet violins can be played, and hoses can produce notes, so the question is how does someone learn to master these behaviors?

Musical instruments represent a very special class of objects. They are designed to be manipulated and to respond, through sound, to the finest nuances of movement. As examples of tools that require fine motor control, they are hard to beat. And, as with any tool requiring fine motor control, a musician must be sensitive to how the instrument responds to an alteration in applied action with the tiniest changes in sound and the tiniest changes in haptic feedback. Indeed, a large part of acquiring skill as a musician is being able to predict, for a given set of movements and responses, the sound that the instrument will make and to adjust movements, in anticipation or in real time, when these expectations are not met.

The issue, as Bernstein points out, is that there are often many ways of achieving the same movement goal [20]. In terms of biomechanics, joints and muscles can be organized to achieve an infinite number of angles, velocities, and movement trajectories, while at the neurophysiological level, many motorneurons can synapse onto a single muscle and, conversely, many muscle fibers can be controlled by one motor unit (see Sect. 3.2 for more details concerning the hand). This results in a biological system for movement coordination that is highly adaptive and that can support us in responding flexibly to perturbations in the environment. In addition, as Bernstein's observations of blacksmiths wielding hammers demonstrated, our ability to reconfigure our bodies in response to the demands of a task goal extends to incorporating the dynamics of the wielded tool into planned movement trajectories [20, 22]. Indeed, it is precisely this ability to adapt our movements in response to the dynamics of both the task and the task environment that allow us to acquire new motor skills.

Given this state of affairs, how do novice musicians (or indeed experienced musicians learning new pieces) select from all the possible ways of achieving the same musical outcome? According to Bernstein's [20] theory of graded skill acquisition, early stages of skill acquisition are associated with "freezing" some biomechanical DoF (e.g., joint angles). Conversely, later (higher) stages are characterized by a more differentiated use of DoF ("freeing"), allowing more efficient and flexible/functional performance. This supposition aligns perfectly with experimental results in which persons adopted a high impedance during early stages of learning (perhaps removing DoF from the coupled dynamics) and transitioning to a lower impedance once the skill was mastered [23].

More recently, Ranganathan and Newell [24, 25] proposed that in understanding how and why learning could be transferred from one context to another, it was imperative to uncover the dynamics of the task being performed and to determine the "essential" and "non-essential" task variables. They define non-essential variables as the whole set of parameters available to the performer and suggest that modifications to these parameters lead to significant changes in task performance. For example, in throwing an object the initial angle and velocity would be considered non-essential variables, because changes to these values will lead to significant changes in the task outcome. The essential variables are a subset of the available working parameters that are bound together by a common function. In the case of throwing an object, this would be the function that relates the goal of this particular throwing task to the required throwing angle and velocity [26]. The challenge, as Pacheco and Newell point out, is that in many tasks this information is not immediately available. Therefore, the learner needs to engage in a process of discovery or "exploration" of the available dynamic behaviors to uncover, from the many possible motor solutions, which will be the most robust. But finding a motor solution is only the first step since learning will only occur when that movement pattern is stabilized through practice [27].

In contrast to exploration, stabilization is characterized as a process of making movement patterns repeatable, a process which Pacheco and Newell point out can be operationalized as a negative feedback loop, where both the non-essential and essential execution variables are corrected from trial to trial. Crucially, Pacheco and Newell determined that, for learning and transfer to be successful, the time spent in the exploration phase and the time spent in the stabilization phase must be roughly equal [26].

As yet, we have little direct evidence of these phases of learning of motor skill in the context of playing acoustic musical instruments. A study by Rodger et al., however, suggests that exploration and stabilization phases of learning may be present as new musical skills are acquired. In a longitudinal study, they recorded the ancillary (or non-functional) body movements of intermediate-level clarinetists before and after learning a new piece of music. Their results demonstrated that the temporal control of ancillary body movements made by participants was stronger in performances after the music had been learned and was closer to the measures of temporal control found for an expert musician's movements [28]. While these findings provide evidence that the temporal control of musicians' ancillary body movements stabilizes with musical learning, the lack of an easy way to measure the forces exchanged across the mechanical coupling between player and instrument means that we cannot yet empirically demonstrate the role that learning to manage the exchange of energy across this contact might play in supporting the exploration and stabilization of movements as skill is acquired. Indeed, the fact that haptic feedback plays a role for the musician in modeling an instrument's behavior has already been demonstrated experimentally using simulated strings [29] and membranes [11, 30]. In both cases, performance of simple playing tasks was shown to be more accurate when a virtual haptic playing interface was present that modeled the touch response of the instrument (see also Chap. 6).

As a final point, we suggest that interacting with a digital musical instrument that has simulated dynamical behavior is very different from interacting with an instrument with a digitally mediated playing interface that only supports information exchange. As an extreme example, while playing keyboard music on a touch screen might result in a performance that retains note and timing information, it is very difficult, if not impossible, for a player to perform at speed or to do so without constantly visually monitoring the position of their hands. Not only does the touch screen lack the mechanical properties of a keyboard instrument, it also lacks the incidental tactile cues such as the edges of keys and the differentiated height of black and white keys that are physical "anchors" available as confirmatory cues for the player.

In summary, a musical instrument interface that incorporates dynamic coupling not only provides instantaneous access to a second channel of information about its state, but, because of the availability of cues that allow for the exploration and selection of multiple parameters available for control of its state, such an interface is also likely to support the development of higher levels of skill and musical expressiveness.

## **2.6 Conclusions**

In this chapter, we have placed particular focus on the idea that the passive dynamics of the body of a musician play an integral role in the process of making music through an instrument. Our thesis, namely that performer-instrument interaction is, in practice, a dynamic coupling between a mechanical system and a biomechanical instrumentalist, repositions the challenge of playing an instrument as a challenge of "playing" the coupled dynamics in which the body is already involved. The idea that an instrument becomes an extension of the player's body is quite concrete when the coupled dynamics of instrument and player are made explicit in a model. From a control engineering perspective, the body-/instrument-coupled dynamics form an inner feedback loop; the dynamics of this inner loop are to be driven by an outer loop encompassing the player's central nervous system. This new perspective becomes a call to arms for the design of digital musical instruments. It places a focus on the haptic feedback available from an instrument, the role of energy storage and return in the mechanical dynamics of the instrument interface, and the possibilities for control of fast dynamic processes otherwise precluded by the use of feedback with loop delay.

This perspective also provides a new scaffold for thought on learning and skill acquisition, as we have only briefly explored. When approached from this perspective, skill acquisition is about refining control of one's own body, as extended by the musical instrument through dynamic coupling. Increasing skill becomes a question of refining control or generalizing previously acquired skills. Thus, soft-assembly of skill can contribute to the understanding of learning to play instruments that express musical ideas. The open question remains: what role does the player's perception of the coupled dynamics play in the process of becoming a skilled performer? Answering this question will require us to step inside the coupled dynamics of the player/instrument system. With the advent of new methods for on-body sensing of fine motor actions and new methods for embedding sensors in smart materials, the capacity to perform such observations is now within reach.

## **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Chapter 3 A Brief Overview of the Human Somatosensory System**

**Vincent Hayward**

**Abstract** This chapter provides an overview of the human somatosensory system. It is the system that subserves our sense of touch, which is so essential to our awareness of the world and of our own bodies. Without it, we could not hold and manipulate objects dextrously and securely, let alone musical instruments, and we would not have a body that belongs to us. Tactile sensations, conscious or unconscious, arise from the contact of our skin with objects. It follows that the mechanics of the skin and of the hand its interaction with objects is the source of information that our brain uses to dextrously manipulate objects, as in music playing. This information is collected by vast array of mechanoreceptors that are sensitive to the effects of contacting objects, often with the fingers, even far away for the region of contact. This information is processed by neural circuits in numerous regions of the brain to provide us with extraordinary cognitive and manipulative functions that depend so fundamentally on somatosensation.

## **3.1 Introduction**

The overarching purpose of the somatosensory system is to inform the brain of the mechanical state of the body that it inhabits. It shares this function with the vestibular system. But whereas the vestibular system operates in the low-dimensional space of head translations and rotations, the somatosensory system takes its input from almost the entire body. The main sources of information arise in part from the load-bearing structures represented by connective tissues such as tendons and ligaments, in part from the motion-producing tissues, the muscles, and in part from the outer layers of body, that is the skin. As a result, unlike the vestibular system, which is sensitive to the movements of a rigid body—the cranium—the somatosensory system relates to

V. Hayward (B)

Sorbonne Universités, Université Pierre et Marie Curie, Institut des Systèmes Intelligents et de Robotique, Paris, France e-mail: vincent.hayward@upmc.fr

© The Author(s) 2018 S. Papetti and C. Saitis (eds.), *Musical Haptics*, Springer Series on Touch and Haptic Systems, https://doi.org/10.1007/978-3-319-58316-7\_3

mechanical domains that are in essence deformable bodies. This explains why, despite the fact that the two systems share the same overall task, they differ fundamentally. The vestibular inputs arise from small, easily identifiable organs in the inner ears, since it is the low-dimensional description of the movements of a rigid body that is of interest. In contrast, the somatosensory system relate to what is essentially an infinite dimensional solid (and liquid) domain and depends on the changes of its internal mechanical state to infer the properties of the objects that are being touched such as their weight, the substance they are made of, or the existence and nature of the relative movement of the body in relation to external objects [35, 74]. In other words, it is a distributed system in the physical sense that its mechanical state is described by (tensor) fields rather than vectorial quantities. This basic fact is of course reflected in its general organisation where very large populations of specific detectors are found in all load-bearing and load-producing tissues. That is not to say that the somatosensory system is unique in its reliance on large populations of sensors. This is also true of all sensory systems, including vision, audition, taste/olfaction and of the vestibular system.

The haptic function depends on several systems of large organs. In an adult person, the skin's mass can reach two kilograms and part of its functions is mechanosensing. However, it must be kept in mind that most of the body's soft and connective tissues are mechanosensitive and associated with abundant innervation. The exact contributions of the different mechanoreceptive channels to the formation of haptic percepts remain today to be established.

Recent research has revealed a number of rather surprising findings. For example, most textbooks teach that the sense of limb's relative position is mediated by mechanoreceptors embedded in the muscles. However, recent research has shown conclusively that the awareness of limb position is also mediated by sensory inputs arising from the skin [20, 21]. Alternatively, it is often assumed that the quality of the surfaces of objects is the exclusive result of cutaneous inputs. Recently, it is been shown that complete abolishment of distal cutaneous input, resulting from trauma or anaesthesia, had negligible effect on participants' ability to discriminate the roughness of surfaces [53], which could be explained by the fact that friction-induced vibrations taking place at the fingertip propagate far inside the anatomy, at least up to the forearm [15], stimulating large populations of mechanoreceptors that might not be located in the skin and that can be quite remote from the locus of mechanical input [69].

These observations demonstrate that the study of the haptic function must be discussed from different perspectives where individual components should not be assigned one-to-one relationships, largely because the sensing organ, as alluded to in the previous paragraph, is by physical necessity distributed in the entire body and not even just at its surface.

## **3.2 Biomechanics of the Hand**

## *3.2.1 Hand Structural Organisation*

David Katz described the hand as a 'unitary organ' where the sensory and motor functions take place together [48]. The hand is not the only organ in the body that has this particularity. The foot is in many ways similar to the hand, but configured for locomotion rather than manipulation. Both organs possess an abundantly articulated skeletal structure held together by connective ligaments in the form of joint capsules and tendons that are connected to muscles located remotely in the forearm or the leg. In turn, these muscles insert in the arm and leg bones, and thus, a single tendon path can span up to four joints with the wrist and the three phalangeal joints. To give a sense of scale of the biomechanical complexity of the hand and the foot, it suffices to consider that phalanges receive four tendon insertions except for the distal phalanges that receive only two. Some tendons insert in several bones, and most tendons diverge and converge to form a mechanical network. The hand and the foot also have the socalled intrinsic muscles that insert directly into small bones, notably for the thumb, with some of these intrinsic muscles not inserting in any bones but in tendons only. Thus, if one considers bones, tendons and muscles to be individual elements, all connectivity options (one-to-one, one-to-several, several-to-one) are represented in the biomechanical structure of the hand, foot and limbs to which they are attached.

## *3.2.2 Hand Mobility*

It is tempting to think of the hand as an articulated system of bodies connected with single-degree-of-freedom joints that guide their relative displacements. This simple picture is quite incorrect on two counts. The first is that skeletal joints are never 'simple' in the sense that they allow movements that ideal 'lower pairs,' such as simple hinges, would not. In biomechanics, one seldom ventures in quoting a precise number of degrees of freedom which, depending on the authors, can vary from 10 to more than 60 when speaking of the hand only. The biomechanical reality suggests that the kinematic mobility of the hand is simply the number of bones considered six times, but the actual functional mobility suggests that certain joint excursions have a much greater span than others. One could further argue that, save for nails, since the hand interacts with objects through soft tissues, its true mobility is infinite dimensional [35], a problem we shall return to when discussing the sensing capabilities of the hand.

The most productive approach to make sense of this complexity is, counterintuitively, to augment the complexity of the system analysed and to also include the sensorimotor neural control system in its description. In effect, the mechanics of the hand mean nothing without the considerable amount of neural tissue and attending sophisticated neural control that is associated with it. In this perspective, the concept of 'synergies' was put forward long ago by the pioneers of the study of movement production and control (Joseph Babinski 1857–1932, Charles Scott Sherrington 1857–1952, Nikolai Bernstein 1896–1966, and others) and has received much study since.

Loosely speaking, the idea behind this concept is that movements with a purpose be it sensory, manipulative, locomotive or communicative—are highly organised. Each of these purposes is associated with the coordinated action of groups of muscles through time, but, importantly, the number of these purposes is small compared to the number of all possible movements. The purposes can include reaching, grasping, feeling, drawing, stepping, pressing on keys, sliding on strings or plucking them, bending notes, and, crucially, they can be combined and chained together to yield complex behaviours orchestrated by the central nervous system. The entire sensorimotor system, much of which is dedicated to the hand, is implemented following a hierarchical organisation with nuclei in the dorsal column, the brain stem, the midbrain, the cerebellum and ultimately several cortical regions. The considerable literature on the subject can be approached through recent books and surveys [10, 51, 67].

## *3.2.3 The Volar Hand*

The inside region of the hand is named 'volar' by opposition to the 'dorsal' region. The volar region is of primary interest since it is the interface where most of the haptic interactions take place. Detecting a small object—say a sewing needle lying on a smooth surface—is absolutely immediate with the fingertip but more difficult with other volar hand regions, and the same object will go undetected by any other part of the body, including the dorsal hand region. It is also evident that the sensitive volar skin is mechanically very different of what is often called the 'hairy skin' covering the dorsal region. The most conspicuous feature is the presence of ridges, that is, of a clearly organised micro-geometry that is not seen elsewhere, except in the plantar region of the foot. In fact, the often called the 'glabrous' skin differs from the 'hairy' skin in four important properties.

Pulp: The glabrous skin is never really close nor very far from a bone. In the fingertip and elsewhere in the hand, it is separated from the bone by a relatively uniform distance of 3 or 4 mm. The space in between is densely filled by a special type of connective tissue called the pulp [33]. This fibrous tissue is crucial to give the volar hand its manipulative and sensorial capabilities since a fingertip can take a load of several hundreds of Newtons without damage and *simultaneously* detect a needle. The pulp gives the skin the ability to conform with the touched object by enlarging the contact surface, which is mainly independent from the load past a certain value [68]. Incidentally, this simple fact makes it evident that the notion of 'force' or even of 'pressure' must be taken carefully when speaking of tactile sensory performance (see Sect. 4.2).


## *3.2.4 Bulk Mechanics of the Fingertip and the Skin*

The glabrous skin covering the volar region of hand is, quite visibly, neither an isotropic nor a homogeneous medium. It is apparent that the ridges introduce preferred directions that facilitate certain types of deformations. The effect of static punch indentation on the human fingertip can be made visible by imaging the shape of finger contact with a flat surface when a small object, such as a guitar string, is trapped at the interface, see Fig. 3.1.

The detailed local properties of the ridged skin were investigated in vivo by Wang and Hayward [79] by loading approximately 0.5 mm2 regions of skin. Unsurprisingly, the measurements revealed great anisotropy according to the ridge orientation when the skin is stimulated in traction, that is, in its natural mode of loading (see Fig. 3.2). On the other hand, the elastic properties of the ridged skin seem to be by-and-large immune of factors such as individuals and thickness of the stratum corneum. Detailed in vivo measurement can also be performed using optical coherence tomography (OCT) or elastography [24, 52], obtaining results similar to those found by direct mechanical stimulation. These findings point out how uncertain it is to predict the properties of tissues across length and timescales. The viscoelastic properties of the ridged skin are dominated by two characteristic times, one very short, of the order of one millisecond, and the other much longer, of the order of several seconds [79], which shows, like the peripheral neural system introduced below, that the mechanical somatosensory system operates at several timescales.

Also of relevance to the design of haptic interfaces is some knowledge of the bulk mechanical properties of the extremities, taken as a whole. Again, this subject is better tackled in terms of specific tasks. When the human finger interacts with a surface, three modes of interaction may be combined: (i) a contact can be made to or released from a surface; (ii) the finger can displace the mutual surface of contact through a rolling motion; (iii) or it can do so through a sliding motion [34, 35]. Each of these

**Fig. 3.1 a** A punch indenting an ideal solid half-space follows the Boussinesq–Flamant's deformation problem, where the elongation follows the pattern indicated by the black line and the shear deformation that of the grey line. **b** Imaging the contact surface indicates that an actual finger grossly follows this pattern. However, a 2mm indentation made by a 1mm punch creates a deformation region as large as 6mm that does not have a circular shape, owing to the anisotropy of the skin introduced by the ridges. Figure from [36]

**Fig. 3.2** Equivalent material properties of human ridged skin along and across ridge direction (solid lines) for eight different people. For most, the equivalent elasticity in elongation is highly depending on the ridge direction and different people can have very different skins. However, when the deformation is dominated by shear, then it is much less dependent on load orientation and on individuals. Figure from [79]

modes corresponds to specific mechanics. When contact is made, the contact surface grows very fast with normal loading, and normal displacement is accompanied with very steep acceleration of the contact force. To wit, a 1 mm indentation of the fingertip by a flat surface corresponds to a normal load of less than 0.2 N, but at 2 mm the normal load is already 10 times larger at 1.0 N, and it takes only an increment of 0.5 mm to reach the value of 5.0 N [68]; concomitantly, the contact area has reached half of its ultimate value for only 0.5 N of load, and past 1.0 N, it will not increase significantly, regardless of the load [68], suggesting that representing a fingertip by a local convex elastic homogenous solid is far from an being an acceptable model in terms of its ability to conform to the gross shape of touched objects. Moreover, these properties are very much dependent on the speed at which indentation occurs. Pawluk and Howe found that the mechanical response curve under similar conditions varied greatly with speed, a 1.0 mm indentation applied at 0.2 mm/s causes a loading of about 0.2 N, as just mentioned, but the same displacement applied at 80 mm/s causes a contact loading of 1.0 N [63].

Most frequently, the finger interacts with a rigid object, which either is oscillating and/or provides the surface on which the finger slides, in all cases generating oscillations in the finger pad. Such occurrences are common during music playing. To model and explain these interactions, it is essential to have a model of the bulk mechanics of fingertip in the small displacements and over the whole range of frequencies relevant to touch, that is DC to about 1 kHz. In the low frequencies, the data can be extracted from studies performed in the condition of slow mechanical loading, transient loading or large displacements [29, 40, 62], but a recent study conducted with the aid of a novel mechanical impedance measurement technique [82] has shown that a fingertip, despite all the complexities of its local mechanics, may be considered as a critically damped mass-spring-damper system with a corner frequency of about 100 Hz and where the contribution of inertia to the interaction force is negligible at all frequencies before elasticity and viscosity [81], see Fig. 3.3. In essence, the fingertip is dominantly elastic below 100 Hz and dominantly viscous above this frequency. In the high frequencies (≥400 Hz), the fingers exhibit structural dynamics that have an uncertain origin. Quite surprisingly, the fingertip bulk elasticity (of the order of 1 N/mm), viscosity (of the order of 1 N s/mm) and equivalent inertia (of the order of 100 mg) are by-and-large independent from a tenfold variation of the normal load. It can be surmised that these properties hold true for all volar regions of the hands and feet.

Friction is arguably the most important aspect of the haptic function since without it we could scarcely feel and manipulate objects. Because the finger is a biological, living object, it has properties which often escape our intuition, especially concerning its frictional properties, that latter having a major impact on the manipulative motor function as well as on its detection and discriminative function [1]. All the aforementioned mechanosensitive sensors in the skin and deep tissues are in fact likely to respond to friction-induced phenomena. A good example of that is any attenuation of the sensitivity of these receptors, for example by a situation as banal as cold hand or dry hands, invariably results in an increase in the grip force as a strategic response of the brain to sensory deficit. This was also documented when fingers are dry since dry skin is more slippery [2]. As another example, recent studies in hedonic touch have established a link between the sensation of pleasantness and the skin's tribological properties that in turn influence the physics of contact [47].

Some key points to keep in mind. First, the notion of coefficient of friction in biotribology must be complemented by the notion of load index, which describes the dependency between net normal load and the net traction, since in most cases of practical importance Amontons' first law, stating that friction is empirically independent from the apparent contact area, does not hold. A second point is the importance of

**Fig. 3.3** Fingerpad impedance for small displacements. Figure from [81]

the presence of water in the physics of the contact owing to the fact that keratin is the building material of the stratum corneum. Keratin is akin to hydrophobic polymers with the effect that traction increases with the presence of water despite the reduction of the interfacial shear strength. This is true up to a point where, in fact, excess of water hydrodynamically decreases friction in competition with the former effect. A third complicating factor is that the presence of water plasticises the stratum corneum with the consequence of dramatically increasing the effective contact area, which is a phenomenon that occurs at the molecular level [19]. A fourth factor is the very large effect of time on the frictional dynamics. In fact, all these four factors dominate the generation of traction as opposed to the normal gripping load, in direct opposition to the simplistic friction models adopted in the greatest majority of neuroscience and robotic studies [1]. Furthermore, this physics depends completely on the counter surface interacting with the fingers, where the material properties, the roughness of the surface and its structural nature (say wood) interact with the physiology of sudation (perspiration) through an autonomic function performed by the brain [2].

## **3.3 Sensory Organs**

## *3.3.1 Muscles, Tendons and Joints*

Muscles are primarily elastic systems that develop a tensional force that depends on several factors among which are at their activation level and their mechanical state, often simplified to just a length. At rest, a muscle behaves passively, like a nonlinear spring that becomes stiffer at the end of its range. When activation is increased from rest to full activation, the active contribution to the passive behaviour is greatest at midrange. As a result, for a given activation level, a muscle looses tonus if it is too short or too long. A muscle that shortens at high speed produces very little tension, while a lengthening muscle gives a greater tension, like a one-way damper. It must be noted that the neuromuscular system takes several hundreds of milliseconds to modulate the activation. Therefore, beyond a few Hertz, the passive portion of the dynamics dominates. Skeletal muscles are in great majority organised in agonist– antagonist systems [84]. These terms describe the fact that separate muscles or muscle groups accelerate or prevent movement by contracting and relaxing in alternation. It is nevertheless a normal occurrence that muscles groups are activated simultaneously, a behaviour termed co-contraction or co-activation. Co-contraction, which result in a set of muscle tensions reaching a quasi-equilibrium around one or more joints, enables new functions, such as stabilisation of unstable tasks [8]. The behaviour of an articulation operating purely in an agonist or antagonist mode is nevertheless very different from that of the same articulation undergoing co-contraction.

A consequence of co-contraction which is relevant to our subject is to stiffen the entire biomechanical system. This can be made evident when grasping an object. Take for instance a ruler between the thumb and the index finger, grip it loosely and note the frequency of the pendulum oscillation. Tightening the grip results in a net increase of this frequency as a consequence of the stiffening of all the tissues involved, including the muscles that are co-contracting: a tighter grip resists better to a perturbation. This also means that the musculoskeletal system can modulate stiffness at a fixed position, for instance when grasping. This observation requires to consider any linear model of the musculoskeletal system with much circumspection.

We can now see how this system can contribute to the sensation of the weight of objects since in one of the strategies employed by people in the performance of this perceptual task is to aim at reaching a static equilibrium where velocity tends towards zero, a condition that must be detected by the central nervous system. For instance, when it comes to heaviness, it has been noticed many times that subjects tend also to adopt a second strategy where rapid oscillations are performed around a point of equilibrium. In the latter case, it is possible to suppose that it is the variation of effort as a function of movement and of its derivative that provides information about the mass (and not about the weight). Muscles are connected to the skeleton by tendons which also have mechanoreceptors called the Golgi organs. These respond to the stress to which they are subjected and report it to the central nervous system, which is thus informed of the effort applied by the muscles needed to reach a static or dynamic equilibrium.

The joints themselves include mechanoreceptors. They are located in the joint capsule, which is a type of sleeve made of a dense network of connective tissues wrapping around a joint and containing the synovial fluid. These receptors—the socalled Ruffini corpuscle—respond to the deformation of the capsule and appear to play a key role when the joint approaches the end of its useful range of movement, in which case some fibres of the capsule begin stretching [28].

The sensory organs of the musculoskeletal system give us the opportunity to introduce a great categorisation within the fauna of mechanoreceptors, namely rapidly adapting (RA) and slowly adapting (SA) receptors. The distinction is made on a simple basis. When a RA receptor is stimulated by undergoing a deformation, it responds by a volley of action potentials for a duration and a density that is driven directly by the rate of change of the stimulus, just like a high-pass filter would (but direct analogies with linear filters should be avoided). When a SA-type receptor is deformed, it responds for the whole duration of the stimulus but is rather insensitive to the transient portion and in that resembles a low-pass filter including the zero frequency component.

This distinction is universal and is as valid for the receptors embedded in ligaments and capsules (SA) as for those located in muscles and in the skin (SA and RA). To pursue the analysis of the perception of object properties, such as shape, we can realise that the joints too are involved in this task, since any muscular output and any resulting skeletal movement have an effect on the joints in the form of extra loading, relative sliding of structures and connective tissue deformation. These observation illustrates the conceptual difficulties associated with the study of the haptic system, namely that it is practically impossible to associate a single stimulus to an anatomical classification of the sources of information.

## *3.3.2 Glabrous, Hairy and Mucosal Skin*

The body surface is covered with skin. As mentioned above, it is crucial to distinguish three main types of skin having very different attributes and functions. The mucosal skin covers the 'internal' surfaces of the body and are in general humid. The gums and the tongue are capable of vitally important sensorimotor functions [7, 39, 75]. The tongue's capabilities are astonishing: it can detect a large number of objects' attributes including their size, their shape, very small curvature radii, hardness and others. Briefly, one may speculate that the sensorimotor abilities of the tongue are sufficient to instantly detect any object likely to cause mechanical injury in case of ingestion (grains of sand, fish bones).

The glabrous skin has a rather thick superficial layer made of keratin (like hairs) which is not innervated. The epidermis, right under it, is living and has a special geometry such that the papillae of the epidermal–dermal junction are twice as frequent as the print ridges. The folds of the papillae house receptors called Meissner corpuscles, which are roughly as frequent in the direction transversal to the ridges as in the longitudinal direction. The Merkel complexes (which comprise a large number of projecting arborescent neurites) terminate on the apex of the papillae matching the corresponding ridge, called the papillary peg. The hairy skin does not have such a deeply sculptured organisation. In addition, each hair is associated with muscular and sensory fibres that innerve an organ called the hair follicle.

This geometry can be better appreciated if considered at several length scales and under different angles. A fingerprint shows that the effective contact area is much smaller than the touched surface. The distribution of receptors is highly related with the geometry of the fingerprint. In particular, the spatial frequency of the Meissner corpuscles is twice that of the ridges. On the other hand, the spatial frequency of the arborescent terminations of the Merkel complexes is the same as that of the ridges. This geometry explains why the density of Meissner corpuscles is roughly five times greater than that of the Merkel complexes [37, 45, 55, 59]. Merkel complexes, however, come in two types. The other type forms long chains that run on the apex of the papillae [60]. The distinctive tree-like structure of this organ terminates precisely at the dermal–epidermal interface.

It is useful to perform simple experiments to realise the differences in sensory capabilities between glabrous and hairy skin. It suffices to get hold of rough surfaces, such as a painted wall or even sand paper, and to compare the experience when touching it with the fingertip or with the back of the hand. Try also to get hold of a Braille text and to try to read it with the wrist. The types of receptors seem to be similar in both kinds of skin, but their distribution and the organisation and biomechanical properties of the respective skins vary enormously. One can guess that the receptor densities are greatest in the fingertips. There, we can have an idea of their density when considering that the distance between the ridges of the glabrous skin is 0.3–0.5 mm.

The largest receptor is the Pacini corpuscle. It is found in the deeper regions of the subcutaneous tissues (several mm) but also near the skin, and its density is moderate, approximately 300in the whole hand [11, 71]. It is large enough to be seen with the naked eye, and its distribution seems to be opportunistic and correlated with the presence of main nervous trunks rather than functional skin surfaces [32]. Receptors of this type have been found in a great variety of tissues, including the mesentery, but near the skin they seem to have a very specific role, that of vibration detection. The Pacinian corpuscle allows to introduce a key notion in physiology, that of specificity or 'tuning'. It is a common occurence in all sensory receptors (be it chemoreceptors, photoreceptors cells, thermoreceptors or mechanorectors) that they are tuned to respond to certain classes of stimuli. The Pacinian corpuscle does not escape this rule since it is specific to vibrations, maximising its sensitivity for a stimulation frequency of about 250 Hz but continuing with decreasing sensitivity to 1000 Hz. It is so sensitive that, under passive touch conditions, it can detect vibrations of 0.1 micrometer present at the skin surface [78]. Even higher sensitivity was measured for active touch: results addressing a finger-pressing task are reported in Sect. 4.2.

The Meissner corpuscle, being found in great numbers in the glabrous skin, plays a fundamental role in touch. In the glabrous skin, it is tucked inside the 'dermal papillae', and thus in the superficial regions of the dermis, but nevertheless mechanically connected to the epidermis via a dense network of connective fibres. Therefore, it is the most intimate witness of the most minute skin deformations [72]. One may have some insight into its size by considering that its 'territory' is often bounded by sweat pores [55, 60].

Merkel complexes, in turn, rather than being sensitive axons tightly packed inside a capsule, have tree-like ramifications that terminate near discoidal cell, the so-called Merkel cells. In the hairy skin, these structures are associated with each hair. They also very present in mucoscal membranes. In the glabrous skin, they have up to 50 terminations for a single main axon [30]. The physiology of Merkel cells is not well understood [54]. They would participate in mechanotransduction together with the afferent terminals to provide these with a unique firing pattern. In any case, Merkel complexes are associated with slowly adaptive responses, but their functional significance is still obscure since some studies show that they can provide a Paciniantype synchronised response up to 1500 Hz [27].

The Ruffini corpuscle, which we already encountered while commenting on joint capsules, has the propensity to associate itself with connective tissues. Recently, it has been suggested that its role in skin-mediated touch is minor, if not inexistent, since glabrous skin seems to contain very few of them [58]. This finding was indirectly supported by a recent study implicating the Ruffini corpuscle not in mechanical stimulation due to direct contact with the skin, but rather in the connective tissues around the nail [5]. Generally speaking, the Ruffini corpuscle is very hard to identify and direct observations are rare, even in glabrous skin [12, 31].

Finally the so-called C fibres, without any apparent structure, innervate not only the skin, but also all the organs in the body and are associated with pain, irritation and also tickling. These non-myelinated, slow fibres (about 1 m/s) are also implicated in conscious and unconscious touch [76]. It is however doubtful that the information that they provide participates in the conscious perception of objects and surfaces (shape, size, or weight for instance). This properties invite the conclusion that the information of the slow fibres participates in affective touch and to the development of conscious self-awareness [56].

From this brief description of the peripheral equipment, we can now consider the receptors that are susceptible to play a role in the perception of external mechanical loading. As far as the Ruffini corpuscles are concerned, several studies have shown that the joints, and hence the receptor located there, provide proprioceptive information, that is estimation of the mechanical state of the body (relative limb position, speed, loading). It is also possible that they are implicated in the perception of the deformation of deep tissues which occurs when manipulating a heavy object. It might be surprising, but the central nervous system becomes aware of limb movements not only by the musculoskeletal system and the joints, but also by the skin and subcutaneous tissues [22].

It is clear that the receptors that innerve the muscles also have a contribution to make, since at the very least the nervous system must either control velocity to zero, or else estimate it during oscillatory movements. Muscles must transmit an effort able to oppose the effects of both gravity and acceleration in the inertial frame. Certainly, Golgi organs—which are located precisely on the load path—would provide information, but only if the load to be gauged is significantly larger than that of the moving limb. Lastly, the gauged object in contact with the hand would deform the skin. From this deformation, hundreds of mechanoreceptors would discharge, some transitorily when contact is made, some in a persisting fashion.

At this point, it should be clear that the experience of the properties of an object, such as its lack of mobility, is really a 'perceptual outcome' arising from complex processing in the nervous system and relying on many different cues, none of which alone would be sufficient to provide a direct and complete measurement about any particular property. This phenomenon is all the more remarkable, since, say a saxophone, seems to have the same weight when is held with the arms stretched out, squeezed between two hands, held by the handle with a dangling arm, held in two arms—among other possibilities—each of these configurations involving distinct muscle groups and providing the nervous system with completely different sets of cues!

## *3.3.3 Electrophysiological Response*

#### **3.3.3.1 Categories of Responses**

The idea behind the study of the electrophysiological response is to measure directly the signals transmitted by the neurons, the so-called action potentials. This measurement can be done by inserting electrodes in peripheral nerves, something that can be done in people without measurable consequences for health. It is when making such measurements that it was realised that there existed the two types of responses already mentioned (SA & FA). It is nevertheless important to distinguish the capacity that has a given receptor to respond to fast stimuli from the type of responses.

For the receptors located in the skeletomuscular system, it is relatively easy to determine their response mode from the anatomy, but in the skin this is not possible. Mechanoreceptors, with the exception of the Pacinian corpuscle, are very small and very dense, and recording is only possible at some distance (wrist, arm, leg). The consensus is that the Ruffini corpuscles (not observed in the glabrous skin) are of the SA type and so are the Merkel complexes. On the other hand, the Meissner corpuscle is of the FA type.

Some of these inferences are made by stimulating the skin with *von Frey filaments*, from Max von Frey who introduced them at the end of the nineteenth century as a calibrated method to stimulate touch. Using this method, it is possible to determine that certain afferent nerve fibres respond from stimulating a tightly limited territory, say of a size of 2 mm (type I), while some others respond to stimulation applied within a much wider territory, up to one centimetre in size, or more (type II). This physiological distinction—yet not anatomical—gives rise to four possibilities: FA-I, FA-II, SA-I, SA-II. The receptive fields are very varied in shape and sizes throughout the surface of the body, frequently overlapping, and often, they do have clear borders [42, 43, 46, 77].

Most mechanical phenomena at play, however, are nonlocal; detecting a one mm2 crumb with the finger has mechanical consequences that spread up to 100 mm<sup>2</sup> of skin tissue; sliding the finger on a surface with 10µm asperities has easily measurable consequences up the forearm [15, 69]. In that sense, it is highly probable that most motor and perceptual behaviours simultaneously engage all mechanoreptors' populations [66].

#### **3.3.3.2 Coding Options**

It stands to reason that the flow of the action potentials must be able to encode information arising from peripheral stimulation. Before proceeding further, it is important to recall that information ascending from the periphery is not the only source that determines the conscious experience, far from it. In fact, self-generated movement [13], intention [85], and learning [17], not counting stimuli coming from other sensory modalities [18, 34], all modify the conscious percept arising from a same stimulation.

A number of codes have been discovered that represent information arising from touch and kinaesthesia neurally. It is likely that many more will be discovered in the future. As far as kinaesthetic information is concerned, it was found that the specific recruitment of nerve fibres encodes spatially the position of a joint [9]. With regard to the direction of movement, it seems plain that the agonist–antagonist organisation of the motor system encodes it automatically. The muscle spindles respond specifically to velocity by a frequency code: the larger is the amount of change of length per unit of time (that is speed), the higher is the number of nerve impulses (or action potentials) per unit of time. This code has the property to be resistant to noise and perturbations: an action potential missed or fired accidentally does not make a great difference over a long period of time. On the downside, this code is by construction not temporally precise because it takes a minimum number of action potentials to encode a rate.

As far as touch is concerned, codes are still mysterious but a few have been found. For low intensity stimulation, certain FA receptors behave like oscillators synchronised with the waveform [65], which corresponds to a temporal code. In touch, it is also clear that spatial coding is fundamental. For instance, when reading Braille each dot specifically stimulates a small population of receptors which convey the presence of the dot [26]. The shape of a touched object can be directly coded by the contact surface [49]. Other codes, however, are likely to be at play. When a fingertip is mechanically loaded ramping from rest to a maximal value in the tangential direction—an event that occurs each time we pick up an object—it was shown that this event is represented by a correlation code [41]. This means that is the temporal coincidence of two or more action potentials that convey the nature of the mechanical interaction between the finger and the object. It has also been shown that when a finger slips on a surface with a single asperity, action potentials are synchronised with encounter of this asperity with each ridge of the print, which corresponds to an extremely fine spatiotemporal code [50].

During gripping, the recruitment code has also been documented as coding directly in skin coordinates [26]. A similar observation can also be made of curvature, since the ratio between the contact surface and the normal load depends on it [25]. It is highly probable that sliding and sticking and transitions between these two states are coded by the relative response of RA and SA populations, which is another form of correlation [70]. Another important attribute of a contact detected by touch is simply the average load—namely its direction and magnitude in the normal and tangential directions [4]—which leads to believe that generally information is coded by receptor populations and not by individual ones. It is also probable that the elastic properties of the touched object are coded peripherally and specifically by composite populations in space and time. Last but not least, the coding of texture, or rather of the micro-geometry of surfaces that interact with the glabrous skin, was the subject of a considerable number of studies [38]. Despite these works, it is likely that most of the codes employed by primates remain to be discovered.

The question of codes can also be considered from the viewpoint of the physiological response of receptors. Unfortunately, this approach is fraught with numerous difficulties. It is very rare when one can stimulate specifically one particular receptor and to measure its response. Since stimulation can only be effected from the surface of the skin, even the most concentrated indentations have consequences far away from the contact site: deformation propagates several millimetres around the zone of stimulation [14]. As a result, it is generally impossible to associate a physiological response to a particular anatomical characteristic.

Due to its size, the Pacinian corpuscle is nevertheless an exception because it is possible to study its response in vitro [3, 6]. It has interesting characteristics some of which are shared with Merkel complexes [27]. The first peculiarity is a frequency-dependent sensitivity: the deformation needed to trigger a single action potential is smallest at 250 Hz. In this condition, the discharge of action potentials is synchronous with the stimulation, giving a direct temporal code. If amplitude is reduced, the corpuscle looses this synchronicity property but still responds over several cycles to truly microscopic deformations. This feature translates into transfer function with a strong, obvious nonlinear jumping behaviour. For a given frequency, the response does not change with amplitude over a range, but once a threshold is reached, a frequency doubling is observed.

Taking the example of the perception of the weight of an instrument, it should become increasingly clear that such perception does not result from a single or simple family of neural signals, but from a veritable jungle of motor and sensorial signals whose conscious perception is that of a unitary percept attributed to the held object. This could contribute to explain why the motor system and the perceptual seem to operate independently from each other, at least when it comes to the conscious knowledge of either action or perception [23, 64].

## **3.4 Central Organs**

It is not easy to paint a concise and logical picture of the central nervous organisation of the haptic system. Besides, it would be misleading to believe that it can be confined to a small number of functionally and anatomically well-delimited cortical areas, ganglions and pathways. The discovery of this organisation is a work in progress. Originally discovered due to the random consequences of war, accidents, diseases, surgical innovations, and today with electrophysiology (in humans, but mostly in monkeys and rats) and brain imaging techniques (pet, fMRI, and very recently optical imaging), it can be said that the representation that is made of this organisation constantly changes with the introduction of new techniques.

Nevertheless, it is useful to have a general idea of the great structures [44]. Sensory pathways ascend through the spine and first project on dorsal column nuclei which in turn project onto the ventral posterior nucleus of the thalamus, located at the apex of the spine, right at the centre of the cranium. Many functions are ascribed to the thalamus, but one of them is to transmit all sensory afferent information (with the exception of olfaction and vestibular inputs) to the cortical regions. This organ seems to be able to process peripheral information into a form that is suitable for cortical processing.

The somatosensory cortex is located on both sides of the great parietal circumvolution, and a huge number of fibres project onto it. The cortex is divided into two main areas, SI (primary) and SII (secondary), on each side of the central parietal sulcus. According to Brodman's nomenclature [86], SI is divided into four areas: 1, 2, 3a and 3b, based on their neuronal architectures. Thalamic fibres terminate for the most part in 3a and 3b which are, in turn, connected to areas 1 and 2, portraying a hierarchical organisation where, like in the other sensory modalities, increasingly abstract representations are successively formed. One believes, for instance, that area 1 is implicated in the representation of textures, that area 2 encodes size and shape, and that areas 3a and 3b are dedicated to lower-level processing. It has been discovered that two other areas of the parietal posterior region, 5 and 7, are also involved in haptic processing. In any case, the somatotopic organisation progressively reduces with the distance from peripheral inputs.

## **3.5 Conclusions**

The somatosensory system is distributed throughout the entire body with mechanical, anatomical and physiological attributes that vary greatly with the regions considered. These variations can be explained by the mechanical function of each organ: the fingertip is very different from, say, the elbow, the lips or the tongue. It is therefore tempting to relate these attributes to common motor functions, such as gripping, throwing objects, eating or playing musical instruments.

## **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Chapter 4 Perception of Vibrotactile Cues in Musical Performance**

## **Federico Fontana, Stefano Papetti, Hanna Järveläinen, Federico Avanzini and Bruno L. Giordano**

**Abstract** We suggest that studies on active touch psychophysics are needed to inform the design of haptic musical interfaces and better understand the relevance of haptic cues in musical performance. Following a review of the previous literature on vibrotactile perception in musical performance, two recent experiments are reported. The first experiment investigated how active finger-pressing forces affect vibration perception, finding significant effects of vibration type and force level on perceptual thresholds. Moreover, the measured thresholds were considerably lower than those reported in the literature, possibly due to the concurrent effect of large (unconstrained) finger contact areas, active pressing forces, and long-duration stimuli. The second experiment assessed the validity of these findings in a real musical context by studying the detection of vibrotactile cues at the keyboard of a grand and an upright piano. Sensitivity to key vibrations in fact not only was highest at the lower octaves and gradually decreased toward higher pitches; it was also significant for stimuli having spectral peaks of acceleration similar to those of the first experiment, i.e., below the standard sensitivity thresholds measured for sinusoidal vibrations under passive touch conditions.

F. Fontana (B)

S. Papetti · H. Järveläinen

ICST—Institute for Computer Music and Sound Technology, Zürcher Hochschule der Künste, Pfingsweidstrasse 96, 8005 Zurich, Switzerland e-mail: stefano.papetti@zhdk.ch

H. Järveläinen e-mail: hanna.jarvelainen@zhdk.ch

F. Avanzini Dipartimento di Informatica, Università di Milano, via Comelico 39, 20135 Milano, Italy e-mail: federico.avanzini@di.unimi.it

B. L. Giordano Institut de Neurosciences de la Timone UMR 7289, Aix-Marseille Université-Centre National de la Recherche Scientifique, 13005 Marseille, France e-mail: brungio@gmail.com

Dipartimento di Scienze Matematiche, Informatiche e Fisiche, Università di Udine, via delle Scienze 206, 33100 Udine, Italy e-mail: federico.fontana@uniud.it

## **4.1 Introduction**

For what we have seen in Chap. 3, the somatosensory system relies on input from receptors that operate within deformable human tissues. One solution for measuring their activity precisely is to keep those tissues free from any kinematic perturbation. Such experiments—in which subjects were typically stimulated with vibrations at selected areas of their skin while remaining still—have set the roots of the psychophysics of passive touch. However, as Gibson observed in 1962, "passive touch involves only the excitation of receptors in the skin and its underlying tissue," while "active touch involves the concomitant excitation of receptors in the joints and tendons along with new and changing patterns in the skin" [24]. This observation suggests that the psychophysics of active touch may exhibit relevant differences from the passive case. Furthermore, a systematic investigation of active touch psychophysics presents additional practical difficulties in experimental settings due to interactivity, which seems to motivate the current lack of results in the field. Even if we assume a small and well-defined vibrating contact at the fingertip, any change in this contact—as typically found in finger actions such as sliding or pressing—gives rise to new normal and longitudinal forces acting on the skin and to different contact areas. Such side-effects are indeed known to alter the tactile percept [9, 10, 28, 34, 36, 54]. The surrounding skin regions, which contribute to tactile sensations, are also dynamically affected by such changes and by the patterns of vibrations propagating across them [49].

The perception of vibrations generated by musical instruments during playing does not make an exception to the above mechanisms. In fact, the respective experimental scenario is conceptually even more complicated and technically challenging. While in general tactile stimuli may be controlled reasonably well in active touch psychophysics experiments, when considering instrumental performance one has to take into account that vibrations are elicited by the subjects themselves while playing and that concurrent auditory feedback may affect tactile perception [30, 46, 50, 59].

As explained in Chap. 2, a tight closed loop is established between musicians and their instruments during performance. Experimentation on active touch in the context of musical performance hypothesizes that tactile feedback affects such interaction in a number of ways and eventually has a role in the production of musical sounds.

## *4.1.1 Open-Loop Experimentation*

The study of haptic properties of musical instruments outside of the musician– instrument interaction (i.e., in open loop) conceptually simplifies the experimental design, while effectively preparing the ground for further studies in closed loop.

The violin, due to its intimate contact with the player, represents one of the most fascinating instruments for researchers in musical haptics. A rich literature has grown to explain the physical mechanisms at the base of its range of expressive features [60]. However, the mechanical coupling of the violin with the performer is strong, so that its vibratory response measured in free-suspension conditions cannot fully represent the vibrotactile cues generated by the instrument when in use [38].

The vibratory response of the piano is relatively easier to assess, as the instrument's interface with the musician is limited to the keyboard and pedals. Furthermore, the mass of the piano is such that the mechanical coupling with the performer's limbs cannot affect its vibrations significantly. However, pianos couple with the floor; hence, vibrations can reach the pianist's body through it and the seat. Piano vibrations have been carefully studied by researchers in musical acoustics, who measured them mainly at the strings or soundboard [51]. In contrast, keyboard vibrations as conveyed to the player have been less researched. In the early 1990s, Askenfelt and Jansson performed extensive measurements on several stringed instruments, including the double bass, violin, guitar, and piano [4]. Overall, vibration amplitude was measured above the standard sensitivity thresholds for passive touch [54], suggesting a role for tactile feedback at least in conveying a feeling of a resonating and responding object. This conclusion, though, was mitigated for the piano keyboard, whose vibration amplitude was mostly found below such thresholds and hence supposedly perceptually negligible. More recently, Keane and Dodd reported significant differences between upright and grand piano keyboard vibrations, while hypothesizing a perceptual role of vibrotactile feedback during piano playing [32].

Other classes of instruments, such as aerophones, likely offer measurable vibrotactile cues to the performer, but to our knowledge a systematic assessment of the perceivable effects of such vibratory feedback has not been yet conducted.

Percussion instruments, on the other hand, respond with a strong kinesthetic feedback that is necessary for performers to rearm their limbs instantaneously, and for executing rebounds and rolls without strain. In this regard, Dahl suggested that the interaction of a drumstick or a hand with the percussion point happens so rapidly, that it does not seem possible for a performer to adjust a single hit simultaneously with the tactile feedback coming from it [11]. The percussive action, in other words, appears to be purely feed-forward as far as multiple hit sequences are not considered (see also Sect. 2.2 in this regard). Finally, electroacoustic and electronic instruments do not seem able to generate relevant vibrotactile feedback, unless a loudspeaker system is mounted directly aboard them.

## *4.1.2 Experiments with Musicians*

Once an instrument has been identified as a source of relevant tactile cues, their potential impact on musical performance and produced musical sound may be tested with musicians. The inclusion of human participants, however, introduces several issues. To start with, as mentioned above, interactive contexts such as the musical one prevent the implementation of experiments with full control over contact areas and forces, or the generation of vibratory stimuli. Also, acoustical emissions from musical instruments engage musicians in a multisensory process where the tactile and auditory channels are entangled at different levels, ranging from the peripheral and central nervous system, to cross-modal perceptual and cognitive processes. Tactile and auditory cues start to interfere with each other in the middle ear. Vibrations in fact propagate from the skin to the cochlear system through bone and tendon conduction, via several pathways [12]. Especially if an instrument is played close to the ear (e.g., a violin) or enters into contact with large areas of the body (e.g., a cello or double bass), such vibrations can reach the cochlea with sufficient energy to produce auditory cues. Cochlear by-products of tactile feedback may be masked by overloading the hearing system with sufficiently loud sound that does not correlate with tactile feedback: Masking noise provided through headphones is often necessary in tactile perception tests [6, 58]. The use of bone-conduction headphones may improve experimental control, as bone-conducted cues could be jammed on their way to the cochlea by vibratory noise transferred to the skull [47]. Even when considering only airborne auditory feedback, earmuffs or earplugs may not provide sufficient cutoff, and uncorrelated masking noise may be needed. The question, then, is how to analyze answers from musicians who had to perform while listening to loud noise. The literature on audio-tactile sensory integration is particularly rich and can help explain possible perceptual synergies or cancellations occurring during this integration [46, 50, 57, 58].

Any tactile interaction experiment that involves musicians should take the aforementioned issues into account. In a groundbreaking study from 2003, Galembo and Askenfelt showed that grand pianos are mainly recognized—and possibly even rated—based on the tactile and kinesthetic feedback offered by their keyboards, more than based on the produced sound [20]. Similarly, in a later study on percussive musical gestures, Giordano et al. showed that haptic feedback has a bigger influence on performance than on auditory cues [25]. Focusing on tactile cues alone, Keane and Dodd reported significant preference of pianists for an upright instrument whose keybed had been modified to decrease vibrations intensity at the keyboard, thus making them comparable to those produced by a grand piano [31, 32]. In parallel, some authors of the present chapter augmented a digital piano with synthesized vibrotactile feedback, showing that it significantly modified the performer's preference [16, 18]. In the same period, one of the world's top manufacturers equipped its flagship digital pianos with vibration transducers making the instruments' body vibrate while playing [27], thus testifying concrete interest from the industry at least for the aesthetic value of tactile cues.

More recently, Wollman et al. showed that salient perceptual features of violin playing are influenced by vibrations at the violin's neck [59], and Altinsoy et al. found similar results using reproduced vibratory cues [3]. Saitis et al. discussed the influence of vibrations on quality perception and evaluation as manifested in the way that musicians conceptualize violin quality [48]. Further details on the influence of haptic cues on the perceived quality of instruments are given in Chap. 5.

## *4.1.3 Premises to the Present Experiments*

Compared to other interfaces of stringed instruments, the piano keyboard is easier to control experimentally, as the performer is only supposed to hit and then release one or more keys with one or more fingers. Other body contacts can be prevented by excluding the use of the pedals. Also, non-airborne auditory feedback—a by-product of the tactile response—can be masked by employing the techniques mentioned above. Furthermore, the sound and string vibrations produced by a key press are in good correspondence with the velocity with which the hammer hits a string [33]. If a keyboard is equipped with sensors complying with the MIDI protocol, then such map is encoded for each key and made available as digital messages. Together, these properties allow the experimenter to (i) record the vibratory response of the keyboard to measurable key actions; (ii) create a database of reproducible action– response relationships; (iii) make use of those data in experiments where pianists perform simple tasks on the keyboard, such as hitting one or few keys.

Our interest in the piano keyboard is not only motivated by its relatively easy experimental control: As mentioned above, its tactile feedback measured in open loop was found hardly above the standard vibrotactile sensitivity thresholds [4]. Did this evidence set an end point to the perception of piano keyboard vibrations? This chapter discusses and compares the results of two previously reported experiments on vibrotactile perception in active tasks: The first one conducted in a controlled setting and the other in an ecological, musical setting. The goal was twofold: (i) to assess how finger pressing (similar to a key-press task) affected vibrotactile detection thresholds and (ii) to investigate whether pianists perceive keyboard vibrations while playing.

Somewhat surprisingly, in Experiment 1 we found sensitivity thresholds much lower than those previously reported for passive tasks. Experiment 2 demonstrated that pianists do perceive keyboard vibration, with detection rates highest at the lower octaves and gradually decreasing toward higher pitches. Importantly, vibrations at the piano keyboard were also measured with an accelerometer for the conditions used in the experiment: While their intensity was generally lower than the standard thresholds for passive touch, conversely a comparison with the thresholds obtained in Experiment 1 provided a solid explanation to how pianists detected vibrations across the keyboard.

These findings suggest that studies on active touch psychophysics are required to better understand the relevance of haptic cues in musical performance and, consequently, to inform the development of future haptic musical interfaces.

## **4.2 Experiment 1: Vibrotactile Sensitivity Thresholds Under Active Touch Conditions**

In this experiment, vibrotactile perceptual thresholds at the finger were measured for several levels of pressing force actively exerted against a flat rigid surface [43]. Vibration of either sinusoidal or broadband nature and of varying intensity was provided in return. The act of pressing a finger is indeed a gesture found while performing on many musical instruments (e.g., keyboard, reed, and string instruments) and therefore represents a case study of wide interest for musical haptics. Based on the results reported by several previous studies [9, 10, 28, 34, 36], we expected perceptual thresholds to be influenced by the strength of the pressing force.

## *4.2.1 Setup*

A self-designed tabletop device called the Touch-Box was utilized to measure the applied normal force and area of contact of a finger pressing its top surface and to provide vibrotactile stimuli in return. Technical details on the device are given in Sect. 13.3.1. The Touch-Box was placed on a thick layer of stiff rubber, and sound emissions were masked by noise played back through headphones. To minimize variability of hand posture, an arm rest was used.

The experiment made use of two vibrotactile stimuli, implementing two different conditions: Band-passed white noise with 48 dB/octave cutoffs at 50 and 500 Hz and a sine wave at 250 Hz. Both stimuli focus around the range of maximal vibrotactile sensitivity (200–300 Hz [55]). During the experiment, stimulus amplitude was varied in fixed steps according to a staircase procedure (see Sect. 4.2.2). Stimulus level was calculated as the RMS value of the acceleration signal, accounting for the power of vibration acceleration averaged across the stimulation time.

Pressing force was a within-subject condition with three target levels, covering a range from light touch to hard press, while still being comfortable for participants [13], as well as compatible with forces found in instrumental practice [4]. In what follows, the three force levels are referred to as Low, Mid, and High, which correspond, respectively, to 1.9, 8, and 15 N, with a tolerance of ±1.5 N.

## *4.2.2 Procedure*

Twenty-seven subjects participated in the sinusoidal condition, and seventeen in the noise condition. They were 19–39-year old (mean = 26, SD = 4.5), and half of them were music students. The experiment lasted between 35 and 60 min, depending on the participants' performance, and a 1-minute break was allowed every 5 min to prevent fatigue.

**Fig. 4.1** Thresholds measured at three pressing force levels, for sinusoidal and noise vibrations. Error bars represent the standard error of the mean. Figure reprinted from [43]

Perceptual thresholds were measured using a one-up-two-down staircase algorithm with fixed step size (2 dB1) and eight reversals, and a two-alternative forced choice (2AFC) procedure. The method targets the stimulus level corresponding to a correct detection rate of 70.7% [35], estimated as the mean of the last six reversals of the up-down algorithm.

Three staircases were implemented, each corresponding to a target force level, which were presented in interleaved and randomized fashion. Participants were instructed to use their dominant index finger throughout the experiment. A trial consisted of two subsequent finger presses, with vibration randomly assigned to only one of them. The participants' task was to identify which press contained the vibration stimulus. Before the observation interval began, a LCD screen turning green signaled the stable reaching of the requested force level.

## *4.2.3 Results*

As shown in Fig. 4.1, at each pressing force level thresholds for sinusoidal vibration were lower than for noise. For both vibration conditions, higher thresholds (i.e., worse detection performance) were obtained at the Low force condition, while at the other two force levels the thresholds were generally lower. The lowest mean threshold (68.5 dB RMS acceleration) was measured at the High force condition with sinusoidal vibration, and the highest at the Low force condition with noise vibration (83.1 dB) thus thresholds varied over a wide range across conditions. Individual differences were also large: The lowest and highest individual thresholds differ typically by about 20 dB in each condition.

<sup>1</sup>In the remainder of this chapter, vibration acceleration values expressed in dB use 10−<sup>6</sup> m/s2 as a reference.

Perceptual thresholds were analyzed by means of a mixed ANOVA. A significant main effect was found for type of vibration (*F*1,<sup>41</sup> = 14.64, *p* < 0.001, generalized η<sup>2</sup> = 0.23) and force level (*F*2,<sup>82</sup> = 137.5, *p* < 0.0001, η<sup>2</sup> = 0.35), while the main effect of musical experience was not significant. Post hoc pairwise comparisons with Bonferroni correction (sphericity assumption was not violated in the within-subject force level factor) indicated that the Low force condition differed from both the Mid and High force conditions, for both vibration types (*t*(82) > 8.85, *p* < 0.0001 for all comparisons). For noise vibration, the difference between Mid and High force conditions was significant (*t*(82) = −3.17, *p* = 0.02), but the respective contrast for sinusoidal vibration was not (*t*(82) = 1.64, *p* > 0.05). The difference between sinusoidal and noise vibrations was significant for the Low (*t*(57.44) = 4.37, *p* < 0.001) and High (*t*(57.44) = 4.29, *p* < 0.001) force conditions, but not for the Mid force (*t*(57.44) = 1.85, *p* > 0.05).

## *4.2.4 Discussion*

Vibrotactile perceptual thresholds were found in the range 68.5–83.1 dB RMS acceleration—values that are considerably lower than what generally reported in the literature. Maeda and Griffin [36] compared acceleration thresholds from various studies addressing passive touch, finding that most of them are in the range 105–115 dB for sinusoidal stimuli ranging from 100 to 250 Hz. The lowest reported acceleration thresholds are 97–98.5 dB, for contact areas (probe size) ranging from 53 to 176.7 mm<sup>2</sup> [1, 2, 15]. It is worth noticing that the widely accepted results by Verrillo [55] report lowest displacement thresholds of approximately −20 dB (re 10−<sup>6</sup> m) at 250 Hz, equivalent to about 105 dB RMS acceleration.<sup>2</sup>

The main result of the present experiment is that vibrotactile sensitivity depends on the applied pressing force. Thresholds were highest at the Low force condition and decreased significantly at both Mid and High force levels. In good accordance with what reported in a preliminary study [44], for noise vibration the lowest threshold was obtained at the Mid force condition, while at the Low and High conditions thresholds were higher, resulting in a U-shaped threshold contour with respect to the applied force. However, as shown in Sect. 13.3.1.4, the spectral centroid of the noise vibration generally shifted toward 300 Hz and higher frequencies for the Mid and High force conditions. Therefore, we suggest that the U-shape of the threshold-force curve might be partially due to the response of the Pacinian channel, which shows a U-shaped contour over the frequency range 40–800 Hz with maximum sensitivity in the 200–300 Hz range [8]. Conversely, for sinusoidal vibrations at 250 Hz, mean dB thresholds decreased roughly logarithmically for increasing pressing forces (see Fig. 4.1). This simpler trend may be due to the more consistent behavior of our system

<sup>2</sup>For a sinusoidal vibration signal s, it is straightforward to convert between acceleration and displacement: sacc <sup>=</sup> sdispl · (2πf)2, where *<sup>f</sup>* is the frequency. Also, RMS values can be obtained directly from peak values: sRMS = speak/ <sup>√</sup>2.

when reproducing simpler sinusoidal vibrations (see Sect. 13.3.1.4). An improved version of the Touch-Box would be needed to test whether a similar trend can be found when noise stimuli are reproduced more linearly for varying pressing forces.

Further studies are needed to precisely assess how vibratory thresholds might be affected by passive forces of strength equivalent to the active forces used in the present study. However, since the Low condition in our experiment was already satisfied by applying light pressing force (the measured mean is about 1.49 N), it may be compared to studies addressing passive static forces. Craig and Sherrick [10] found that increasing static force on the contactor produces an increase in vibrotactile magnitude. They considered vibration bursts at 20, 80, and 250 Hz lasting 1240 ms, contact areas up to 66.3 mm2, and static forces of about 0.12 and 1.2 N. Harada and Griffin [28] used a contact area of 38.5 mm<sup>2</sup> and found that forces in the range 1–3 N led to significant lowering of thresholds by 2–6 dB RMS at 125, 250, and 500 Hz. The lowest thresholds reported are however around 100 dB RMS acceleration. On the other hand, Brisben et al. [9] reported that passive static contact forces from 0.05 to 1.0 N did not have an effect on thresholds. However, with only four participants, the statistics of those results are not robust. Nevertheless, the authors suggested that extending these investigations to higher forces, as found in everyday life, would be important. They also hypothesized that increasing the force beyond 1–2 N could lower thresholds by better coupling of vibrating surfaces to bones and tendons, which could result in more effective vibration transmission to distant Pacinian corpuscles. That might also contribute to explain the generally lower thresholds that we found for higher forces. In our study, force level was found strongly correlated to contact area, resulting in larger areas for higher forces, which clearly contributed to further lowering perceptual thresholds [43].

Only a few related studies are found in the literature dealing with non-sinusoidal stimuli. Gescheider et al. [22] studied difference limens for the detection of changes in vibration amplitude, with either sinusoidal stimuli at 25 or 250 Hz or narrowband noise with spectrum centered at 175 Hz and 24 dB/octave falloff at 150 and 200 Hz (contact area 2.9 cm2). They found that the nature of the stimuli had no effect on difference limens.

Wyse et al. [61] conducted a study with hearing-impaired participants and found that, for complex stimuli and whole hand contact (area of about 50–80 cm2), the threshold at 250 Hz was 80 dB RMS acceleration, i.e., comparable with our results, especially in the Low force condition. In that study, it is hypothesized that the temporal dynamics of spectrally complex vibration might play a key role in detecting vibrotactile stimulation. In our case, however, the stimuli had no temporal dynamics. Sinusoidal stimuli resulted in lower RMS acceleration thresholds as compared to noise vibration. This may be explained intuitively by considering that equivalent RMS acceleration values for sinusoidal and noise stimuli actually result in a similar amount of vibration power being concentrated at 250 Hz (a frequency characterized by peak tactile sensitivity [55]), or spread across the 50–500 Hz band, respectively. This explanation is supported by the findings by Young et al. [64], who reported lower thresholds produced by sinusoidal stimuli than spectrally more complex signals (square and ramp waves).

The Pacinian channel, targeted by this study, is capable of spatial summation. Previous studies [21, 55] showed that for contact areas between 2 and 510 mm<sup>2</sup> at the thenar eminence of the hand, and for frequencies in the 40–800 Hz range, displacement thresholds decrease by approximately 3 dB with every doubling of the area. Intuitively, a reason for that is that the number of stimulated skin receptors increases with larger contact areas. In the present experiment, the interactive nature of the task resulted in high variability of the contact area [43]. The mean contact areas measured in the experiment were in the range 103–175 mm2, contributing to explaining the reported enhanced sensitivity.

The Pacinian channel is also sensitive to temporal summation, which lowers sensitivity thresholds and enhances sensation magnitude [21]. Verrillo [53] found that thresholds decrease for stimuli at 250 Hz for increasing duration up to about 1 s, when delivered through a 2.9 cm<sup>2</sup> contactor to the thenar eminence of the hand. Gescheider and Joelson [23] examined temporal summation with stimulus intensities ranging from the threshold to 40 dB above it: For 80 and 200 Hz stimuli, peak displacement thresholds were lowered by up to about 8 dB for duration increasing from 30 to 1000 ms. The present study made use of stimuli lasting 1.5 s, which likely contributed to enhancing vibrotactile sensitivity.

Large inter-individual differences in sensitivity were found in our experiment, which we could not fully explain by contact area or age. However, this observation is in accordance with other studies [1, 29, 36, 41]. Sources for large variations in sensitivity may be many. While exposure to vibration is a known occupational health issue and can cause acute impairment of tactile sensitivity [28], experience in conditions similar to the present experiment seemed a possible advantage. Therefore, we further analyzed the performance of musician participants, who are often exposed to vibrations when performing on their instruments: Indeed, musicians' mean threshold in the Low force condition was about 3 dB lower than non-musicians', but there was no significant difference at the other force levels. Overall, enhanced sensitivity in musicians—previously observed by other authors [14, 45, 65]—could not be confirmed.

By considering actively applied forces and unconstrained contact of the finger pad, the present study adopted a somewhat more ecological approach [24] as compared to the studies mentioned above. An analogous approach was adopted by Brisben et al. [9], who studied vibrotactile thresholds in an active task that required participants to grab a vibrating cylinder. While the exerted forces were not measured, in accordance with our results much lower thresholds were reported than in the most previous literature: At 150 and 200 Hz, the average displacement threshold was 0.03µm peak (down to 0.01µm in some subjects), which is equivalent to RMS acceleration values of 85.5 dB at 150 Hz, and 90.5 dB at 200 Hz. The authors suggested that such low figures could be due to the multiple stimulation areas on the hand involved in grabbing the vibrating cylinder, the longitudinal direction of vibration, and the force exerted by the participants. A few studies report that active movement results in lower sensitivity thresholds [63] or better percept possibly due to the involvement of planning and additional cognitive load as compared to the passive case [52].

Despite its partially ecological setting, this experiment kept control over the generation of sinusoidal and noise vibrations, with focus on the region of maximal human vibrotactile sensitivity (200–300 Hz). Vibratory cues at the piano keyboard, however similar in form to the respective tones, are more complex than either of the conditions in Experiment 1 and are likely to be perceived differently depending on the type of touch and the number of depressed keys. The following experiment tested first vibration detection in a piano-playing task, and second whether active touch sensitivity threshold curves of Experiment 1 could predict the measured results.

## **4.3 Experiment 2: Vibration Detection at the Piano Keyboard During Performance**

A second experiment investigated vibrotactile sensitivity in a musical setting [19]. Specifically, the goal was to measure the ability of pianists to detect vibration at the keyboard while playing. Vibration detection was measured for single and multiple tones of varying pitch, duration, and dynamics.

## *4.3.1 Setup*

The experiment was performed at two separate laboratories using similar setups, centered around two Yamaha Disklavier pianos: A grand model DC3 M4 and an upright model DU1A with control unit DKC-850. The Disklaviers are MIDI-compliant acoustic pianos equipped with sensors for recording performances and electromechanical motors for playback. They can be switched from normal operation to a "silent mode." In the latter modality, the hammers do not hit the strings and therefore the instruments neither resound nor vibrate, while their MIDI features and other mechanical operations are left unaltered. The two setups are shown in Fig. 4.2.

During the experiment, the normal and silent modes were switched back and forth across trials, letting participants receive respectively either natural or no vibrations from the keys. In both configurations, participants were exposed to the same auditory feedback produced by a physical modeling piano synthesizer (Modartt Pianoteq), set to simulate either a grand or an upright piano, and driven in real time byMIDI data sent by the Disklaviers. The synthesized sound was reproduced through Sennheiser HDA-200 isolating reference headphones (grand piano) or Shure SE425 earphones (upright piano). In the latter case, 3M Peltor X5A earmuffs were worn on top of the earphones for additional isolation. Preliminary testing confirmed that through these setups the Disklaviers' operating modes (normal or silent) were indistinguishable while listening to the piano synthesizer from the performer's seat position, meaning that any acoustic sound coming from the pianos in normal mode was fully masked.

**Fig. 4.2** The two Disklavier setups used in the experiment. Left: Yamaha DC3 M4 grand piano. Right: Yamaha DU1A upright piano. Figure adapted from [19]

The loudness and dynamic response of the piano synthesizer were preliminary calibrated to match those of the corresponding Disklavier model in use (details are given in Sect. 13.3.2).

Participants could sense the instrument's vibration only through their fingers on the keyboard. Other sources of vibration were excluded: The pedals were made inaccessible, while the stool, the player's feet, and the piano were isolated from the floor by various means [17]. Vibration measurements confirmed that, as a result of the mechanical insulation, playing the piano did not cause vibrations at the player's seat exceeding the noise floor in the room.

The experiment was conducted under human control, with the help of software developed in the Pure Data environment, which was used to: (i) read computergenerated playlists describing the experimental trials; (ii) set the Disklavier's playing mode accordingly; (iii) check if the requested tasks were executed correctly; (iv) record the participants' answers.

## *4.3.2 Procedure*

Sensitivity was measured at six A tones of different pitch ranging from A0 to A5, chosen after a pilot study [17], reporting a significant drop in detection above A5. Tone duration was either "long" (8 metronome beats at 120 BPM) or "short" (2 beats), and dynamics either "loud" (*mf* to *ff*, corresponding to MIDI key velocities in the range 72–108) or "soft" (*p* to *mp*, key velocities 36–54). In addition to single tones, participants were requested to play three-tone clusters around D4 and D5.

The experiment consisted of two parts: In part A, participants played long and loud single tones; in part B, tone dynamics and duration were modified so as to make the detection task potentially harder in the low range, where vibrations should be most easily perceived [17]. Additionally, by extending the contact area, the note clusters


**Table 4.1** Factors and conditions in the piano experiment

were expected to facilitate detection in the high range, where sensitivity should be low [17]. The conditions are summarized in Table 4.1.

The experiment followed a 2AFC (yes/no) procedure, which required participants to report whether they had detected vibrations during a trial or not. Each condition was repeated eight times in normal mode and eight times in silent mode, in randomized order. However, part A was performed before part B.

Participants were instructed to use their index fingers for single keys or fingers 2-3-4 for chords and to play pitches lower than the middle C with their left hand and the rest with the right hand.

Fourteen piano students participated in the upright piano condition, and fourteen in the grand piano condition. Their average age was 27 years and they had in average 15 years of training, mainly on the acoustic piano.

## *4.3.3 Results*

Sensitivity index *d* , as defined in signal detection theory [26], was computed for each subject and condition as follows:

$$d' = Z(\text{hits}) - Z(\text{false alarm}),$$

where *Z*(*p*) is the inverse of the Gaussian cumulative distribution function, hits is the proportion of "yes" responses with vibrations present, and false alarms is the proportion of "yes" responses with vibrations absent. Thus, a proportion of correct responses *p*(*c*) = 0.69 corresponds to *d* = 1 and chance performance *p*(*c*) = 0.50 to *d* = 0. Perfect proportions 1 and 0 would result in infinite *d* and were therefore corrected by (1 − 1/16) and (1/16), respectively [26].

Results of part A are presented at the top of Fig. 4.3: Sensitivity was highest in the lower range and decreased toward higher pitches. At A4 (440 Hz), vibrations were still detected with mean *d* = 0.84, while at D5 (587 Hz) and A5, performance dropped to chance level. A mixed ANOVA indicated a significant main effect of pitch (*F*(6, 156) = 26.98, *p* < 0.001). The results for the upright and the grand piano did not differ significantly, nor was there a significant interaction of pitch and piano type. The Mauchly test showed that sphericity had not been violated.

**Fig. 4.3** Sensitivity *d* in part A (top) or parts A and B (bottom). Error bars represent the standard error of the mean [40]. Chance performance (*d* = 0) is represented by the dashed line. Figures reprinted from [19]

The results were collapsed over upright and grand pianos, and a trend analysis was conducted. A linear trend was significant (*t*(156) = −12.3, *p* < 0.0001), indicating that as pitch increases, sensitivity to vibrations decreases. Results from parts A and B are presented together at the bottom of Fig. 4.3, showing small differences in mean sensitivity between normal, soft, and short conditions. However, none of the contrasts between long and short duration or loud and soft dynamics at A0 or A1 was significant. The difference was more notable between clusters and single notes: For the cluster CDE4, sensitivity was significantly higher than for the isolated note D4 (*t*(294) = 5.96, *p* < 0.0001), whereas the much smaller difference between D5 and the cluster CDE5 was not significant. Even considering the possible effect of learning between part A and B (average sensitivity at pitches A0 and A1 was 0.23 higher in part B), the result suggests that at D4, playing a cluster of notes facilitates vibration detection.

## *4.3.4 Vibration Characterization*

In order to gain further insight into the results, vibration signals at the keyboard were measured on both the grand and upright Disklaviers.

An in-depth description of the measurements and related issues is given in Sect. 13.3.2.2. For convenience, only essential details are reported here. Vibration signals were acquired for different MIDI velocities at each of the 88 keys of the Disklavier pianos via a measurement accelerometer and recorded as audio signals. A digital audio sequencer software was used to record vibration signals, while reproducing MIDI tracks that played back each single key of the Disklaviers. Additional MIDI tracks were used to play CDE4 and CDE5 clusters, while vibration was recorded with the accelerometer attached to the respective C, D, and E keys in sequence. The MIDI velocities were chosen to cover the entire dynamics reproducible by the Disklaviers' motors.

Acceleration signals had a large onset in the attack, corresponding to the initial fly of the keys followed by their impact against the keybed. Figure 4.4 shows a typical attack, recorded from the grand Disklavier playing the A2 note at MIDI velocity 12. These onsets, appearing in the first 200–250 ms, are not related to the vibratory response of the keys and were therefore manually removed from the samples.

Acceleration values in m/s <sup>2</sup> were computed from the acquired signals by making use of the nominal sensitivity parameters of the audio interface and the accelerometer. Similarly to what was done by Askenfelt and Jansson [4], the spectra of the resulting acceleration signals were compared to Verrillo's reference vibrotactile sensitivity curve [55]. Note that this curve reports sensitivity as the smallest, frequency-dependent displacement *A*( *f* ) (in meters) of a sinusoidal stimulus *s*(*t*) = *A*( *f* )sin(2π *f t*) that is detected at the fingertip. Therefore, a corresponding acceleration curve was computed from the original displacement curve in order to compare with our acceleration signals. Thanks to the sinusoidal nature of the stimuli employed by Verrillo, the corresponding acceleration signal could be found

analytically as *s*¨(*t*) = −*A*( *f* ) · (2π *f* )<sup>2</sup> sin(2π *f t*). Consequently, the acceleration threshold curve *A*( *f* ) · (2π *f* )<sup>2</sup> was used for comparison to our signals. Confirming the results by Askenfelt and Jansson [4], no spectral peaks were found to exceed the acceleration threshold curve, even for notes played with high dynamics. To exemplify this, Fig. 4.5 shows the spectrum of the highest dynamics of the note that participants detected with the highest sensitivity (part A), i.e., A0 played at MIDI velocity 111, along with the threshold curve.

Since Verrillo's thresholds cannot explain the results of Experiment 2, RMS acceleration values were computed in place of spectral peak amplitudes, in analogy with Experiment 1. Vibration signals were first processed with a specifically designed low-pass filter to shape stimuli according to human vibrotactile band [19]. RMS

**Fig. 4.6** RMS acceleration values of keys played as in part A (top) or parts A and B (bottom). The horizontal lines represent (min/max) vibrotactile thresholds as measured in Experiment 1 for noise and sinusoidal stimuli over a range of active pressing forces. Figure adapted from [19]

values in dB were then extracted from the filtered signals over time windows equal to the lengths of the stimuli, that is 1 s for short and 4 s for long trials. Figure 4.6 shows the resulting RMS values for parts A and B, respectively, together with the RMS thresholds of vibration reported in Experiment 1. A comparison of the RMS acceleration values and perceptual thresholds for noise shown in these figures against the sensitivity curves of Fig. 4.3 suggests that RMS values of broadband stimuli have more potential to explain the results of Experiment 2.

## *4.3.5 Discussion*

The results presented in the previous section show that sensitivity to key vibrations is highest in the lowest range and decreases toward higher pitches. Vibrations are clearly detected in many cases where the vibration acceleration signals hardly reached typical thresholds found in the literature for sinusoidal stimuli.

The literature on the detection of complex stimuli provides support to our results, although it does not explain them completely. As already discussed in Sect. 4.2.4, Wyse et al. [61] report RMS acceleration threshold values at 250 Hz corresponding to 80 dB, a value compatible with our results. However, the characteristics of those stimuli may have occasionally produced significant energy at lower frequencies, causing the thresholds to lower once they were presented to the whole hand.

The pianist receives the initial transient when the hammer hits the string; then, the vibration energy promptly decreases and its partials fade each with its own decay curve. The initial peak may produce an enhancement effect similar to those measured by Verrillo and Gescheider limited to sinusoids [56] and hence contribute to sensitivity.

As discussed earlier, the P-channel is sensitive to the signal energy, while is not able to recognize complex waveforms. Loudness summation instead occurs when vibration stimulates both the Pacinian and non-Pacinian (NP) channels, lowering the thresholds accordingly [7, 37, 56]. In our experiment, summation effects were likely to occur when the A0 key and, possibly, the A1 key were pressed. From A3 on, only the P-channel became responsible for vibration perception. Figure 4.3 seems to confirm these conclusions, since they show a pronounced drop in sensitivity between A1 and A3 in both parts of Experiment 2. As Fig. 4.6 demonstrates, this drop is only partially motivated by a proportional attenuation of the vibration energy in the grand piano, while it is not motivated at all in the upright piano. Hence, it is reasonable to conclude that the NP-channel played a perceptual role until A3. Beyond that pitch, loudness summation effects ceased.

In analogy with Experiment 1, the results of this experiment also suggest the occurrence of spatial summation effects [10] when a cluster of notes, whose fundamentals overlap with the tactile band, is played instead of single notes. As Fig. 4.3 (bottom panel) shows, playing the cluster in the fourth octave boosted the detection in that octave, whereas the same effect did not occur in the fifth octave. Unlike Experiment 1, this summation originates from multifinger interaction rather than varying contact areas in single-finger interaction. This evidence opens an interesting question about the interaction of complex vibrations reaching the fingers simultaneously. Measurements of cutaneous vibration propagation patterns in the hand resulting from finger tapping show, however, an increase in both intensity and propagation distance with the number of fingers involved [49], which may partially explain the increased sensitivity we observed.

Unlike Experiment 1, where uni-modal tactile stimuli were used, here we employed bimodal audio-tactile stimuli. Therefore, the possibility of cross-modal amplification effects needs to be shortly discussed, even though Experiment 2 did not investigate this aspect. As discussed earlier, previous studies on cross-modal integration effects [46, 58] support the concrete possibility that an audible piano tone, whose vibratory components are a subset of the auditory components, helps detect a tactile signal near threshold. Although in our case the sound came from a synthesizer, both the auditory and tactile signals shared the same fundamental frequency of the piano tone, and furthermore the first partials were close to each other, respecting the hypothesis of proximity in frequency investigated by Wilson [58]. We did not test a condition in which subjects played the piano in normal mode in the absence of auditory feedback, or using sound uncorrelated with vibration (e.g., white noise). Although that may provide significant data about the effective contribution of auditory cues to vibration detection on the piano, a different experimental setup should be devised. Other cross-modal effects that may have instead contributed to impair the detection [62] should be considered as minor with respect to the spectral compatibility and temporal synchronization of the audio-tactile stimulus occurring when a piano key was pressed.

Yet another relevant difference with Experiment 1 is that in this case the pressing forces exerted by pianists were unknown and most likely not constant throughout a single trial. The maximum and minimum sensitivity thresholds lines in Fig. 4.6, which report the results of Sect. 4.2.3, correspond to constant pressing forces of 1.9 and 8 N for noise vibration, and 1.9 and 15 N for 250 Hz sinusoidal vibration. These force values occur when piano keys are hit at dynamics between *pp* and *f*, with negligible difference between *struck* and *pressed* touch style [20, 33]. Conversely, *ff* dynamics require stronger forces up to 50 N [4]. In Experiment 2, it seems reasonable to assume that pianists initially pressed the key according to the dynamics required by the trial and then, once the key had reached the keybed, accommodated the finger force on a comfortable value while attending the detection process. If our participants adapted finger forces toward the range mentioned above, then their performance in this experiment would fall in between the results for sinusoidal and noise stimuli in Experiment 1. Experiment 1 additionally found that, when using low finger force, musicians on average exhibit slightly better tactile acuity than non-musicians. Even if this difference was not significant, our participants could have reduced the finger force only after starting a trial that required loud dynamics, while leaving the force substantially unvaried during the entire task in the other cases. This behavior seems indeed quite natural.

The hypothesis that vibrotactile sensitivity to RMS acceleration falls in between the thresholds for 250 Hz sine wave and filtered noise is coherent with the temporal and spectral characteristics of the stimuli: Right after its initial transient, a piano tone closely resembles a decaying noisy sinusoid. For instance, it can be simulated by employing several hundreds of damped oscillators whose outputs are subsequently filtered using a high-order transfer characteristic [5]. A remaining question is whether the RMS acceleration values of filtered noise plotted in Fig. 4.6 explain our thresholds sufficiently, or if there is a need to discuss them further. Other elements in favor of further discussion are the mentioned potential existence of a cross-modal amplification and evidences of superior tactile acuity in musicians [65].

## **4.4 Conclusions**

We have given an introduction to the role of active touch in musical haptic research. A closed loop between musicians and their instrument during performance poses a major challenge to experimental setups: While playing, musicians generate themselves the vibrotactile feedback and are at the same time influenced by the produced sound. To discuss the possible links between music performance tasks and basic active touch psychophysics, we presented two experiments, one in a controlled and one in an ecological setting, showing evidence that pianists perceive keyboard vibrations with sensitivity values resembling those obtained under controlled active touch conditions. Overall, the results presented here suggest that research on active touch in musical performance may prove precious to understand the role, mechanisms, and prospective applications of active touch perception also outside the musical context. An example application that seems at immediate reach of current tactile interfaces is to create illusory effects of loudness change by varying the intensity of vibratory feedback [39, 42].

Although interesting and necessary, our results represent only a premise for further research activities aimed at precisely understanding the role of tactile feedback during piano playing. Exploratory experiments have already been performed in an attempt to understand whether changes in the "timbre" of tactile feedback may determine equivalent auditory sensations. Some results in this regard are presented in Sect. 5.3.2.2. If confirmed, after excluding the influence of non-airborne sonic cues on auditory perception, such results would imply the ability of the tactile and auditory systems to interact so as to form a wider, multimodal notion of musical timbre, for which some partial evidence has been found in musicians [59] and non-musicians [47]. Several questions related to the role of tactile feedback in musical performance remain open. For instance, feedback from percussion instruments is likely to define strong patterns of skin vibration extending far beyond the interaction point. The propagation of vibration across the skin has been recent object of research having potentially interesting haptic applications outside the musical context [49]. It cannot be excluded that percussionists control their playing by testing specific wide-area tactile patterns they learned, and then retained in the somatosensory memory after years of practice with their instrument: Some sense of unnatural interaction with the instrument otherwise should not be experienced by drummers and percussionists when they play rubber pads and other digital interfaces. Furthermore, while it is not precisely known how wind instrument players make use of the vibrations transmitted by the mouthpiece, digital wind controllers like the Yamaha WX series never achieved wide popularity, possibly also due to their unnatural haptic feedback.

**Acknowledgements** The authors wish to thank Francesco and Valerio Zanini for recording piano vibrations and contributing to perform the piano experiment. This research was pursued as part of project AHMI (Audio-Haptic modalities in Musical Interfaces, 2014–2016), funded by the Swiss National Science Foundation.

## **References**


Proceedings of the Sound and Music Computing conference (SMC), Maynooth, Ireland, pp. 161–167 (2015)


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Chapter 5 The Role of Haptic Cues in Musical Instrument Quality Perception**

**Charalampos Saitis, Hanna Järveläinen and Claudia Fritz**

**Abstract** We draw from recent research in violin quality evaluation and piano performance to examine whether the vibrotactile sensation felt when playing a musical instrument can have a perceptual effect on its judged quality from the perspective of the musician. Because of their respective sound production mechanisms, the violin and the piano offer unique example cases and diverse scenarios to study tactile aspects of musical interaction. Both violinists and pianists experience rich haptic feedback, but the former experience vibrations at more bodily parts than the latter. We observe that the vibrotactile component of the haptic feedback during playing, both for the violin and the piano, provides an important part of the integrated sensory information that the musician experiences when interacting with the instrument. In particular, the most recent studies illustrate that vibrations felt at the fingertips (left hand only for the violinist) can lead to an increase in perceived sound loudness and richness, suggesting the potential for more research in this direction.

## **5.1 Introduction**

Practicing a musical instrument is a rich multisensory experience. As explained in Chap. 2, the instrument and player form a complex system of sensory-motor interactions where the sensory feedback provided by the instrument as a response to a playing action (bowing, plucking, striking, blowing, pumping, rubbing, fingering) is

C. Saitis (B)

Audio Communication Group, Technische Universität Berlin, Sekretariat E-N 8, Einsteinufer 17c, 10587 Berlin, Germany e-mail: charalampos.saitis@campus.tu-berlin.de

H. Järveläinen

ICST—Institute for Computer Music and Sound Technology,

Zürcher Hochschule der Künste, Pfingstweidstrasse 96, 8005 Zurich, Switzerland e-mail: hanna.jarvelainen@zhdk.ch

C. Fritz

Équipe LAM—Lutheries-Acoustique-Musique, Institut Jean le Rond d'Alembert UMR 7190, Université Pierre et Marie Curie - CNRS, 4 place Jussieu, 75005 Paris, France e-mail: fritz@lam.jussieu.fr

shaped not only by *listening* to the sound produced by that action, but also by *feeling* the cutaneous vibrations (vibrotactile sensation) and reactive forces (proprioceptive sensation) resulting from the same action. In assessing the heard sound in terms of technical execution and expressive intention—pitch, timing, articulation, dynamics, timbre—the musician integrates additional haptic cues before the next sound is made in order to adjust their playing technique. In this sense, the perception and evaluation of the quality of a musical instrument, as seen from the perspective of the performer, are a rich multisensory experience as well.

The proprioceptive component of the haptic feedback at a musical instrument is connected to the behavior of the instrument's (re)action. An instrument with a precise and responsive action allows a skilled musician to produce a wide variety of timbre nuances through fine-grained control of synchrony, dynamics, attack speed, articulation, and balance in polyphonic texture. Vibrotactile feedback, on the other hand, consists essentially of the same oscillations that the instrument body radiates as sound [42, 49, 69–71] and is perceived simultaneously with the auditory signal, but differently [4, 6, 18, 25, 31, 41, 45, 62, 65]. In contrast to hearing, where maximal sensitivity is in the range of 3000–4000 Hz, vibrotaction is most sensitive in the vicinity of 250 Hz (see Sect. 4.2), which is within the range of most orchestral instruments and already at about 1000 Hz the sensation of vibrations is lost, whereas the range of most instruments extends well beyond this frequency. Tactile waveforms of varying type and complexity can be discriminated [1, 8, 51, 59, 72] and can activate areas of the auditory cortex in the absence of sound input [14]. Auditory and tactile frequency is likely calculated in an integrated fashion during preattentive sensoryperceptual processing—much earlier in the information processing chain than had been supposed [13]. An overview of further comparisons between the auditory and tactile modalities is given in Sect. 12.2. But is the vibrotactile sensation at a musical instrument perceptually relevant to its judged quality?

In the first part of this chapter, we will review recent research on the perceptual evaluation of violin quality from the perspective of the musician. Haptic feedback is particularly relevant in playing an instrument such as the violin where physical contact with the performer is highly intimate compared to other instruments due to the violin's sound making mechanism. The fingers, chin, and shoulder of the violinist are in immediate contact with the vibrating parts of the instrument, implying a rich source of haptic feedback, an understanding of which should help to reveal particular aspects of quality perception. We will initially discuss psycholinguistic evidence of how violin quality is conceptualized in the mind of the violinist during playing-based preference tasks and then describe a series of studies on the perception and quality evaluation effects of vibrotactile feedback at the left hand of the violinist in normal playing scenarios.

Alongside the violin, we have chosen the piano as a second example case. Here, the contact between the performer and the instrument is much less intimate compared to the violin. Traditional piano playing involves touching only the keys (modern piano repertoire may sometimes require hitting or plucking the strings) and pedals (mediated by shoes). The nature and origin of piano touch have long been a source of fundamental disagreement in music performance and perception research: Are the timbre and loudness of a single note determined solely by the velocity of the hammer, or can the pianist further control them through the type of touch? In the second part of this chapter, we will then review recent literature on haptic feedback when playing the piano, examining the relationship between touch and tone quality, and more generally the importance of vibrotactile feedback to the perceptual evaluation of piano quality by the performer.

## **5.2 Violin**

The violin as we know it today was developed in the early sixteenth century around Cremona in Italy and can be seen as the result of applying the tuning of the medieval rebec (fifths) to the body of the lira da braccio [16]. The transition from baroque to classical music led to a few further modifications in the second half of the eighteenth century, such as a longer, narrower fingerboard, and neck. Since then, the basic violin lutherie has remained largely unchanged, combining visual charm with ergonomics and a precise acoustical function.

Sound is produced by bowing (or plucking) one or more strings at a location between the bridge and the edge of the fingerboard. The played string produces oscillations that are not efficiently radiated by the string itself due to its much smaller diameter than the acoustic wavelength of most audible frequencies [23]. Instead, the forces exerted from the vibrating string on the bridge cause the violin body to vibrate and thus radiate sound. The varying patterns in which different harmonics are transformed by the vibrating modes (resonances) of the body thus "color" the radiated sound. Figure 5.1 depicts a typical violin frequency response function (defined as the input admittance measured at the E-string notch on the bridge). Furthermore, violin body resonances exhibit a slow decay that brings a "ringing" quality to the sound [37]. At frequencies above about 1 kHz, the motions of the body create frequency-dependent directivity formations that add "flashing brilliance" to its sound [64].

## *5.2.1 Touch and the Conceptualization of Violin Quality by Musicians*

Attempts to quantify the characteristics of "good" and "bad" violins from vibrational measurements such as the input admittance (Fig. 5.1) and/or listening tests have largely been inconclusive (see [52] for a review). On the one hand, this may be due in part to overly broad characterizations of "good" and "bad." On the other hand, both approaches end up considering the instrument isolated from the musician and no haptic information is provided. Woodhouse was among the first to consider that what distinguishes one violin from another lies not only in its perceived sound quality but

**Fig. 5.1** Input admittance of a violin obtained by exciting the G-string corner of the bridge with a miniature force hammer and measuring the velocity at the E-string corner of the bridge with a laser Doppler vibrometer [52]. The magnitude and phase are shown in the top and bottom plots, respectively. Some of the so-called signature modes (i.e., strongly radiating and thus crucial to violin sound) can be observed in the open string region, below about 600 Hz: the Helmholtz-type cavity mode A0 at around 280 Hz and the first strongly radiating corpus bending mode B1+ just above 500 Hz. Also, important is the hill-like collection of peaks known as the "BH peak" (bridge and/or body hill) in the vicinity of 2–2.5 kHz, which allows a solo violin to be heard over an ensemble of instruments

also in what he termed its *playability*, as in how the violinist "feels" the instrument and how easy it is to produce a good sound [68]. To this end, recent research on violin acoustics and quality has focused attention on the perceptual and cognitive processes involved when violinists assess violins under normal playing scenarios.

Fritz and colleagues carried out a series of listening tests using virtual violins, whereby synthesized bridge-force signals were convolved with a digital filter mimicking the input admittance of the violin [29]. The measured admittance of a "good-quality" modern violin was first decomposed into its modal components, the parameters of which were then used to re-synthesize it, allowing for controlled variations of vibrato and body damping. Results showed that when listening to single notes, violinists found it difficult to assess the "liveliness" of the sound, and often, the word itself was not used in a consistent way across individuals. But when asked to *play* on an electric violin, whereby the actual bridge-force signal was passed through modified re-synthesized admittances in real time, musicians were able to rate liveliness consistently within and between individuals. This seems to suggest that liveliness is processed differently in passive listening versus active playing contexts, where haptic cues from proprioceptive and vibrotactile feedback are present.

In another study, preference judgments made by three violin players during a listening and a playing test were compared in conjunction with psycholinguistic analyses of free-format verbal descriptions of musician experience provided by the three violinists [28]. The authors used a method from cognitive linguistics that relies on theoretical assumptions about cognitive-semantic categories and how they relate to natural language [20]. Categories can be thought of as collective representations and knowledge, to which individual assessments are conveyed by means of a shared discourse. From what is being said (content analysis) and how it is being said (linguistic analysis), relevant inferences about how people process and conceptualize sensory experiences can be derived (semantic level) and further correlated with physical parameters (perceptual level). This approach has been applied to other instruments such as the piano [11] and the guitar [50], providing novel insights into how musicians perceive instrumental sound as well as playing characteristics. Fritz and colleagues found that the overall evaluation of a violin, as reflected in the verbal responses of the musicians, varied between listening and playing conditions, and the latter invoking linguistic expressions influenced not only from the produced sound but also by the physical interaction between the performer and the instrument.

Saitis and colleagues carried out two violin playing perceptual tests based on a carefully controlled protocol [56, 57]. Emphasis was given to the design of conditions that are musically meaningful to the performer (e.g., playing versus listening, comparing different instruments like in a violin workshop, using own bow, allowing time to familiarize with the different violins, developing own strategy). In the first experiment, skilled violinists ranked a set of different violins from least to most preferred. In the second experiment, another group of players rated a different set of violins according to specific attributes as well as preference. In both experiments, musicians were asked to verbally describe their choices through open-ended questions. Analyses of intra-individual consistency and inter-player agreement in the (nonverbal) preference and attribute judgments showed that while violinists generally agreed on what particular attributes they look for in an instrument, the perceptual evaluation of the same attributes varied dramatically across individuals, thus resulting in large interplayer differences in the preference for violins. A third experiment [58] and studies by Fritz et al. [26, 27] and Wollman et al. [66, 67] reached similar conclusions.

To better understand the perceptual and cognitive processes involved when violinists evaluate violins, Saitis and colleagues further analyzed the verbal expressions collected in their two violin playing tests [53–55], expanding on an earlier work of Fritz et al. [28]. Based on psycholinguistic inferences, it was argued that violin players of varying style and expertise share a common framework for conceptualizing violin quality on the basis of semantic features and psychological effects that integrate perceptual attributes (i.e., perceptual correlates of physical characteristics) of not only the sound produced but also the vibrotactile and proprioceptive sensations experienced when playing the instrument (Fig. 5.2). The bowed string and vibrating body system contribute to the perception of sound quality through (a) the amount of felt vibrations in the left hand, shoulder, and chin (conceptualized as *resonance*); (b) through assessing the offset (*speed*) and amount (*ease*) of reactive force (conceptualized as *response*) from the body in the right hand (through the bow) with respect to the quality and intensity of the heard as well as felt vibrations; and (c) through comparing these between different notes across the instrument's register (conceptualized as *balance across strings*).

These psycholinguistic investigations provide empirical evidence that vibrations from the violin body and the bowed string (via the bow) are used by violinists as extra-auditory cues that not only help better control the played sound [4], but also contribute to a crossmodal audio-tactile assessment of its attributes. The perception of

**Fig. 5.2** From body vibrations to semantic categories: a cognitive model describing how the perception of violin quality is elaborated on the basis of both auditory and haptic cues [55]

violin sound quality is thus elaborated both from sensations linked to auditory information and from haptic factors associated with proprioceptive and vibrotactile cues. The cognitive model shown in Fig. 5.2 raises interesting questions concerning the characterization of haptic feedback in violin playing quality tests—what to measure and how? Can standard vibrational measurements, such as a violin's bridge admittance (Fig. 5.1), capture everything significant about the reactive force and vibration levels felt by the player? If yes, in what ways can this information be extracted?

## *5.2.2 Vibrotactile Feedback at the Left Hand*

Acoustics and psychophysics literature on the "feel" of a violin has been limited compared to the ample amount of research on the instrument's sound. Marshall suggested that violin neck vibrations felt through the left hand form the basis for the perception of how a violin feels [43, 44]. He argued that the more often the left hand detects motions at antinodal parts of the neck (which are typically damped when the musician holds the violin but can be sensed directly on the skin), the more "alive" the violin will be felt. Askenfelt and Jansson showed that vibrations perpendicular to the side of the neck, measured on four violins of varying quality during playing a single note (lowest *G*, 196 Hz), were above or very close to vibration sensation thresholds measured at the fingertip under passive touch conditions by Verrillo [61]. However, no evidence was found that higher neck vibration intensity would result in judging a violin as being of better quality [4]. One limitation of that study was that

**Fig. 5.3** Horizontal vibration levels at the side of the necks of violins (first position) perceived as either **a** "vibrating" or **b** "non-vibrating" (solid lines) and vibration sensation threshold at the left hand of violinists (dashed line). Reproduced from [65] with permission from S. Hirzel Verlag

vibration amplitude was measured for five frequencies only, corresponding to the first five harmonics of the played note and thus lying below the 1 kHz upper limit of the human skin sensitivity range. Another potential issue—discussed in Sect. 4.3.4 for the piano—is that Verrillo's thresholds may not fully reflect actual vibration detection offsets when the left hand holds the neck of the violin (e.g., differences in location and size of contact area, pressure exerted from the hand on the neck).

Wollman and colleagues were the first to systematically address the role of haptic cues from neck vibrations on violin quality perception. Expanding on the work of Askenfelt and Jansson [4], vibration levels were measured at the violin neck in first position1 across a set of ten instruments, which were characterized by a professional violinist according to how "vibrating" they were felt to be [65]. Neck vibration frequency response curves of "vibrating" and "non-vibrating" violins, obtained across the whole range of the instrument through laser vibrometry, were then compared to absolute vibrotactile thresholds measured on fourteen violinists holding in first position a real isolated violin neck vibrating at six frequencies between 196 and 800 Hz (the first four were chosen to correspond to the open strings). This setup helped obtain violin playing-specific thresholds (i.e., measured under active touch conditions, similar to what was done in Sect. 4.3 for the piano) that are more appropriate to compare with vibration levels than those measured by Verrillo [61] and used by Askenfelt and Jansson. It was observed that while neck vibrations of "vibrating" violins were well above the detection threshold by an average of 15 dB in the range 200–800 Hz, those of "non-vibrating" violins exhibited a steep attenuation of about 40 dB around 600 Hz and stayed below or close to the threshold above that (Fig. 5.3).

In another study [66], fifteen professional musicians listened to three violins while seating on a chair and holding a real isolated violin neck on which they fingered the performed score. The instruments were being played live by another violinist (nonparticipant) in the same room, placed behind a curtain in front of the participants.

<sup>1&</sup>quot;Position" refers to where the left hand is placed on the string. In the first position, the index presses the string at the scroll end of the fingerboard, which produces the next note (full tone) up from the open string (e.g., on the *G* string, first position corresponds to A).

Along with the live sound, vibrations of the played violins were picked up at the scroll using a small accelerometer and then transmitted through a shaker system to the isolated neck (Fig. 5.4). They were presented either at the same level as in the played violin, reduced by half, or fully attenuated. This condition was described by the authors as *active listening*. Participants were asked to rate the violins on *richness of sound*, *loudness*,*responsiveness*, and *pleasure of playing*. It was observed that violinists judged all three violins as having a less loud but also a less rich sound whenever the level of vibrations felt on the isolated neck was reduced by half (Fig. 5.5). These results complemented the findings of Yau and colleagues, who have shown that in a non-musical context, the simultaneous presentation of tactile distractors can cause an increase in perceived tone loudness [71].

In a third experiment [67], twenty violinists evaluated five violins under three sensory masking conditions: playing without hearing the produced sound, playing

**Fig. 5.4** Experimental setup for transmitting vibrations from the neck of a played violin to an isolated neck [66]. Reproduced with the permission of the Acoustical Society of America

without feeling the produced vibrations, and playing normally (i.e., neither modality was masked). Auditory feedback was masked by means of earmuffs and in-ear monitors playing white noise with a bandwidth of 20–20000 Hz, while passive antivibration material was added to the chin rest to minimize bone conduction. Vibrations were primarily masked on the left hand using vibrating rings worn on the thumb, index, and ring fingers, while vibrations through the chin and shoulder rests were attenuated as in the auditory masking scenario. In each condition, musicians first rated each violin on a number of criteria related to perceived sound and playing characteristics and then commented on how relevant those criteria were each time. These data provided further evidence that the perceptual evaluation of violin attributes such as liveliness, power, evenness across the strings, or dynamic range relies not only on sonic information but also on vibrotactile cues. Concerning overall preferences, it was observed that removing auditory feedback was not more disruptive than attenuating felt vibrations, although its effect tended to depend on the instrument (Fig. 5.6).

These studies indicate that the violin neck vibrations felt by violinist through the left hand can serve as an important cue to the concept of "feel" in violin quality evaluation, as well as augment the perception of qualities attributed to the sound (in that case "loud" and "rich"). They also introduce novel methods for characterizing vibrotactile feedback at the left hand. Another source of haptic cues that potentially relate to perceived "feel" and sound quality is the vibration of the chin rest. Askenfelt and Jansson argued that the jaw is less sensitive than the left hand, but it may still be possible for the violinist to sense these vibrations because of the larger contact area of the jaw with the chin rest [4]. Similarly to the violin neck, it would be interesting to investigate whether vibrotactile feedback at the chin contributes to the perception of a violin's "feel" and/or sound.

**Fig. 5.6** Mean preference ratings of five violins under three different playing conditions (COND): normal (N), masked auditory feedback (noA), masked tactile feedback (noT). Vertical bars represent the standard errors of the mean. Reproduced from [67]; published under the Creative Commons Attribution (CC BY) license

## **5.3 Piano**

The modern piano, descending from the harpsichord and introduced by Bartolomeo Cristofori in 1709, evolved into two distinct types, the grand piano and the upright piano. The latter was developed in the middle of the nineteenth century, and its action differs somewhat from that of the first due to design constraints, although they share the same sound production principle [23]: A piano string is set in vibration when the respective key is depressed, a damper raised, and a felt hammer hits the string (Fig. 5.7). String vibrations are transmitted through the bridge to the soundboard, from which the sound radiates into the air. Modal structure of the soundboard and material properties further contribute to the acoustics of the piano. The sound is characterized by different decay rates between partials [21], a two-part pattern

**Fig. 5.7** Illustration of the function of the piano action at successive stages during a keystroke. **a** *Rest position*: The hammer rests via the hammer roller on the repetition lever, a part of the lever body. The lever body stands on the key, supported by the capstan screw. The weight of the hammer and lever body holds the playing end of the key in its upper position. The damper is resting on the string. **b** *Acceleration*: When the pianist's finger depresses the key, the lever body is rotated upward. The jack, mounted on the lever body, pushes on the roller and accelerates the hammer. The damper is lifted off the string by the inner end of the key. **c** *Let-off* : The tail end of the jack is stopped by the escapement dolly, and the top of the jack is rotated away from the hammer roller. The free hammer continues toward the string. The repetition lever is stopped in waiting position by the drop screw. **d** *Check*: The rebounding hammer falls with the hammer roller on the repetition lever in front of the tripped jack. The hammer is captured at the hammer head by the check at the inner end of the key. Reprinted from [3] with the permission of the Acoustical Society of America

of time decay (or double decay) due to double and triple unison strings [63], and inharmonicity in terms of stretching of the partials due to string stiffness [22].

## *5.3.1 Piano Touch and Tone Quality*

There is a long-standing discrepancy between the acoustical basis of how the timbre of a single piano tone is created and the practical experience of piano performers [3, 5]. When considering only the mechanics of the hammer-string interaction, piano timbre would be an instrument-specific result of loudness, which in turn depends on the velocity at which the hammer hits the string, controlled only through key velocity produced by the finger pressing force of the player. The way of touching the key would therefore have no influence on the resulting timbre. Skilled pianists, on the other hand, aim to control timbre and loudness independently through touch and gestural means involving movements of the entire upper body. A review on the historical development of various schools on piano technique as well as recent performance analysis and biomechanical studies on piano touch is presented by MacRitchie [40].

There is some evidence in favor of the touch effect, although it seems to be weaker than many pianists believe and mostly caused by other aspects of the sound than the tonal component. Goebl and colleagues measured the ability of pianists to perceive differences in piano sound independently of intensity [35]. Half of the participants were able to correctly distinguish between struck and pressed touch in the presence of finger-key noises occurring 20–200 ms before the sound. When the noises were cut from the sound signals, performance dropped to chance level. Pianists were also able to distinguish piano sounds of equal hammer velocity with either present or absent key-keybed noises with an average of 82% accuracy [34]. Askenfelt observed that structure-born transients, dependent on the type of touch and present 20–30 ms before the first transversal wave on the string arrives at the bridge, may potentially be connected with the pianist's touch [2]. More recently, numerical simulations of the hammer head-shank interaction showed a difference in spectral profile between legato and staccato sounds in the range of 500–1000 Hz [17]; however, an effect on perceived timbre was not shown experimentally. Suzuki reported a slight spectral brightening for G5, in the order of 1.5 dB at the tenth partial, as a result of "hard" or "soft" touch depending on the degree of stiffness of shoulder, elbow, wrist, and finger [60]. When listening only, about half of the participants could distinguish an effect of similar degree after training.

To discover how pianists achieve fine-grained control of their instrument's sound, the way they describe and recognize timbre nuances in piano performance has gained interest. Bernays and Traube quantified a semantic space of five descriptors (*dry, bright, round, velvety*, and *dark*) [10] based on an analysis of free verbalizations provided by pianists [7] and conducted a series of studies where pianists performed pieces highlighting each of the five semantic dimensions of piano timbre. Despite differences between musicians relating to individual playing styles, common timbre nuance strategies were revealed across different performances [11, 12]. The latter were saliently grouped by the intended timbre on a bidimensional space by means of principal components analysis. The first component was found to be associated with dynamics, attack, and soft pedal features, while the second dimension was related to sustain pedal. Further playing style factors included key depression depth, legato versus staccato articulation, and balance between hands.

Given the pianist's common ways of nuance control, the question arises whether listeners can differentiate and identify the resulting timbres in piano performance. To this end, Bernays reported a pilot study where listeners both described freely and identified in a forced choice task the timbre of piano performance excerpts, each intended to reflect one of the following timbre nuances: *bright, dark, distant, fullbodied, harsh, matte, round*, and *shimmering* [9]. Participants identified the timbre categories above chance level except for *round* and *matte*. Some categories, like *bright* and *shimmery*, were frequently mixed up, probably due to their semantic proximity.

These studies have revealed that pianists can control timbre independently of dynamics: The way of touching the keys produces differences in contact noises (finger-key, key-key bottom, and release sounds) as well as slight spectral effects. While these may be inaudible to the average listener, they have a stronger and more important effect on the skilled pianist due to sensory integration of the matching touch and sound information [15]. Especially in polyphonic touch, these subtle vibrotactile cues may enable the player to produce and control a wide range of timbre nuances.

## *5.3.2 Haptic Cues and Instrument Quality*

Some early experiments on multimodal perception of piano quality were conducted by Galembo and Askenfelt [30], in which pianists evaluated four concert grand pianos under varying sensory feedback conditions. When freely playing the instruments, professional pianists ranked them as expected according to the manufacturers' reputation. However, musicians failed to identify the pianos in a listening-only condition, nor was the resulting quality ranking equal to the playing-based evaluation. In a subsequent free playing task, where visual feedback was blocked by means of blindfolding, the musicians and auditory feedback was blocked through masking noise, the pianists were actually able to identify the instruments without difficulty. These experiments offer some evidence that pianos can be identified by their haptic response perhaps even better than by their sound. As an underlying mechanism, one should expect that different piano actions react differently to different dynamics and types of touch and that these differences are perceivable and possibly of more importance than auditory cues to the player.

Askenfelt and Jansson had previously made timing measurements of the various parts of the piano action and observed differences mainly as a function of dynamics and regulation of the action (mechanical adjustments to compensate for the effects of wear) [3]. Goebl et al. [36] studied in detail the temporal behavior of three grand piano actions. Touch-related differences were found through measurements of fingerkey, hammer-string, and key-keybed contact times and maximum hammer velocities throughout the entire dynamic range for several keys. A different key velocity trajectory in struck and pressed sounds was also observed. Struck sounds showed two acceleration phases of key velocity, while the pressed sounds developed more linearly. These differences between struck and pressed touch were observed in all three pianos that were measured. However, it remains unknown how the behavior of the piano action may affect the player experience. The authors of the study hypothesize that since the pianist needs to (unconsciously) estimate the path from touch to tone onset and intensity for various dynamics and types of touch, a high-quality instrument is one that has a precise and consistent action. In their own informal evaluation as pianists, the most highly appreciated instrument turned out to have the lowest compressibility of the parts of action, short free-travel times of the hammer, and late maxima in the hammer velocity trajectory.

#### **5.3.2.1 Vibrations in the Acoustic Piano**

Keane analyzed keyboard vibrations at four upright and four grand pianos by removing harmonic peaks from the spectrum of the vibration signal and thus splitting it into tonal and broadband parts [38]. Similar tonal components were observed across the two piano types, but upright pianos showed a stronger broadband component, which could explain the generally lower perceived quality of upright versus grand pianos. In fact, a later study showed that pianists preferred the tone quality and loudness profile of an upright piano with attenuated broadband vibrations [39].

Fontana and colleagues investigated the effect of key vibrations on acoustic piano quality using both a grand and an upright Yamaha Disklavier, which can operate in both an acoustic and silent mode [25]. While playing, pianists received auditory feedback through a piano software synthesizer and tactile feedback through the Disklavier keyboard. The technical setup is described in more detail in Sect. 4.3.1. The experimental task involved comparing a non-vibrating to a vibrating piano setup during free playing according to several quality attributes. In the non-vibrating setup (A), the Disklavier was operating in silent mode, which prevents the hammers from hitting the strings and thus from producing vibrations. In the vibrating setup (B), the Disklavier was operating in acoustic mode, allowing the natural vibration of the strings to be transmitted to the soundboard as well as to the keys. However, the acoustically produced sound was blocked by insulating earmuffs placed on top of the earphones playing back the synthetic piano sound. Pianists rated the following attributes on a continuous scale ranging from −3 ("A much better than B") to +3 ("B much better than A"): *dynamic range*, *loudness*, *richness*, *naturalness*, and *preference*. All attributes except *preference* were rated separately in the low (keys below D3), mid (keys between D3 and A5), and high (keys above A5) range.

For both the grand and the upright piano type, the vibrating setup was preferred to the non-vibrating condition (Fig. 5.8). The mean *preference* scores were

1.05 (*n* = 15, SD= 1.48) for upright piano and 0.77 (*n* = 10, SD= 1.71) for grand piano. The distributions of the *preference* ratings did not differ significantly between pianos. Interestingly, while the participants generally preferred when vibrations were present, in the subsequent debriefing only one of them could pinpoint vibration as the difference between the setups. There was considerable positive correlation between attribute scales and frequency ranges. Ratings correlated highly between the low and mid ranges (mean Pearson ρ = 0.58) and between the mid and high regions (ρ = 0.43). At a later stage, a vibration detection sensitivity experiment conducted using the same setup (see Sect. 4.3) showed that piano key vibrations are perceived roughly up to note A4 (440 Hz). As such, the high range was entirely beyond the sensitivity range. That said, the detection experiment was performed under controlled timing and single notes or three-note clusters in the high range, while a free playing task constitutes a more ecological setting (usually involving multifinger interaction). This may explain the slight effect of vibration on higher frequencies in the latter. For further analysis, new dependent variables were formed by taking the average over the low- and mid-frequency ranges. Partial correlation analysis and principal components analysis suggested that *naturalness* and *richness of tone* were the attributes most associated with increased *preference*.

Inter-individual consistency was low in both piano groups, suggesting high disagreement between individuals. Specifically, five participants preferred the nonvibrating setup. When the negative preference rating was used as a criterion for a posteriori segmentation [48], the attitudes of the two groups segregated clearly. While the negative and positive groups gave rather similar ratings for *dynamic range* and *loudness*, their mean ratings for *richness, naturalness*, and *preference* were clearly

different (Fig. 5.9). The mean *preference* ratings were 1.58 (*n* = 20, SD= 0.79) and −1.61 (*n* = 5, SD= 1.10) for the positive and negative groups, respectively. Thus, while 80% of the participants associated *dynamic range* and *loudness* with *naturalness, richness*, and *preference*, the remaining 20% had the opposite opinion.

#### **5.3.2.2 Digital Piano Augmented with Vibrations**

A recent study on the effect of the nature of vibration feedback on perceived piano sound quality suggested that pianists may well be sensitive to the match between the auditory and the vibrotactile feedback [24]. The experimental setup (described in detail in Sect. 13.3.2) involved a digital keyboard enhanced both by realistic and synthetic key vibrations. Realistic vibrations were recorded from a Yamaha Disklavier grand piano. Synthetic vibration signals were generated using bandpassfiltered white noise, centered at the pitch and matching the amplitude envelope and energy of the recorded vibrations. They were interpolated according to key velocity and reproduced by transducers attached to the bottom of a digital piano. The reference setup consisted of auditory feedback only (A). The three test setups consisted of auditory feedback plus (B) recorded real vibrations, (C) recorded real vibrations with 9 dB boost, and (D) synthetic vibrations. Each of the test setups was compared to the reference setup in a free playing task, similar to what described above for the acoustic piano. Ratings were given on *dynamic control, richness, engagement, naturalness*, and overall *preference*.

On average, participants preferred the vibrating setup in all categories except for *naturalness* in condition D (Fig. 5.10). The strongest preferences were for *dynamic*

**Fig. 5.10** Results of the digital piano quality experiment described in [24]. Boxplot presenting median and quartiles for each attribute scale and vibration condition. Positive values indicate preference for the vibrating setup

*control* and *engagement*. Generally, condition C was the most preferred of the vibration conditions: It scored highest on four of the five scales, although B was considered the most natural. Interestingly, B scored lowest in all other scales. Similar to the Disklavier experiment discussed in the previous section, participants could be classified a posteriori into two groups, where median *preference* ratings for setup C were +2.0 and −1.5 for each group, respectively. In the larger group of positive preference (*n* = 8), nearly all attributes were rated positively versus only one in the smaller, negative group (*n* = 3). Notably, although auditory feedback remained unchanged, participants associated higher preference of the vibrating setup to *richness of tone*, which, during preparation for the experiment, was explained to them as a sound-related attribute. This supports the hypothesis that from the perspective of the musician, the perception of instrument quality emerges though the integration of both auditory and haptic information.

## **5.4 Conclusions**

The perceptual evaluation of musical instrument quality has traditionally been considered a unisensory experience in the scientific and industrial world alike, based exclusively on how the produced tone sounds in terms of pitch, dynamics, articulation, and timbre. To a certain extent, this is naturally expected. After all, the objective of playing a musical instrument is to make (musical) sounds. But while this holds true for the non-musician listener, it only tells part of the story from the perspective of the musician, where aural impression is accompanied by haptic feedback due to one or more bodily parts of the player physically touching vibrating components of the instrument. Well-established theories of sensory-motor multimodal interaction and auditory-tactile multisensory integration in the analytical and empirical study of music performance assert that haptic cues carry important information concerning the control of the (re)action of the instrument and thus its sound and that temporal frequency representations are perceptually linked across audition and touch.

The violin and the piano offer unique example cases to examine whether the haptic interaction between the musician and the instrument can have a perceptual effect on quality evaluation. Both instruments require a significant amount of sensory-motor synergy to produce refined and precise sonic events, providing rich haptic feedback to the performer. At the same time, unlike the piano setup, violinists experience vibrations at other bodily parts than the hands, which makes it difficult to measure performance parameters and control vibrotactile feedback in normal playing experimental scenarios. The physical differences in the violin versus piano touch and the experimental freedoms or constraints imposed by them can help better understand the role of vibrotaction on the playing experience as well as the expressive possibilities it can afford in varying performance contexts. Particularly in the case of the piano, the MIDI protocol and the availability of computer-controlled keyboard instruments such as the Yamaha Disklavier and Bösendorfer CEUS offer fertile opportunities to obtain detailed piano performance data under well controlled but musically meaningful experimental conditions, although with some limitations [33].

Our review has shown that the vibrotactile component of the haptic feedback during playing, both for the violin and the piano, provides an important part of the integrated sensory information that the musician experiences when interacting with the instrument. In particular, the most recent violin and piano studies provide evidence that vibrations felt at the fingertips (left hand only for the violinist) can lead to an increase in perceived sound loudness and richness, suggesting the potential for more research in this direction. Investigations of the type and role of musical haptic feedback have also been reported for other instruments (e.g., [19, 31, 32]) as well as singing [47]. A vast field of topics await investigation, starting from the methods and aspects of instrument quality evaluation per se [15]. In which aspects does haptic feedback have a significant role? Which performance parameters (for example, timing accuracy) can be used to assess the haptic dimension in instrument quality perception?

**Acknowledgements** This work was supported by a Humboldt Research Fellowship awarded to Charalampos Saitis by the Alexander von Humboldt Foundation. Part of the research was pursued within the Audio-Haptic modalities in Musical Interfaces (AHMI) project funded by the Swiss National Science Foundation (2014–2016). Hanna Järveläinen wishes to thank Federico Fontana, Stefano Papetti, and Federico Avanzini for developing the technical setups used in the reported piano experiments and for helpful feedback about earlier versions of this chapter. Federico Fontana is also gratefully acknowledged for the original conception of the piano studies.

## **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Chapter 6 A Functional Analysis of Haptic Feedback in Digital Musical Instrument Interactions**

**Gareth W. Young, David Murphy and Jeffrey Weeter**

**Abstract** An experiment is presented that measured aspects of functionality, usability and user experience for four distinct types of device feedback. The goal was to analyse the role of haptic feedback in functional digital musical instrument (DMI) interactions. Quantitative and qualitative human–computer interaction analysis techniques were applied in the assessment of prototype DMIs that displayed unique elements of haptic feedback; specifically, full haptic (constant-force and vibrotactile) feedback, constant-force only, vibrotactile only and no feedback. From the analysis, data are presented that comprehensively quantify the effects of feedback in haptic interactions with DMI devices. The investigation revealed that the various types of haptic feedback applied had no significant functional effect upon device performance in pitch selection tasks; however, a number of significant effects were found upon the users' perception of usability and their experiences with each of the different feedback types.

## **6.1 Introduction**

Recent developments in interactive technologies have seen major changes in the way artists and performers interact with digital music technology. Computer music performers are presented with a myriad of interactive technologies and afforded nearcomplete freedom of expression when creating computer music or sound art. In real

University College Cork, Cork, Ireland

e-mail: g.young@cs.ucc.ie; gareth.young@live.co.uk


© The Author(s) 2018 S. Papetti and C. Saitis (eds.), *Musical Haptics*, Springer Series on Touch and Haptic Systems, https://doi.org/10.1007/978-3-319-58316-7\_6

G. W. Young (B) · D. Murphy · J. Weeter

time, they can manipulate multiple parameters relating to digitally generated sound; effectively creating gesture interfaces and sound generators that have no real-world acoustic equivalent. When presented with such freedom of interaction, the challenge of providing performers with a tangible, transparent and expressive device for sound manipulation becomes apparent.

DMIs present musicians with performance challenges that are often unique to computer music. One of the most significant deviations from traditional musical instruments is the level of physical feedback conveyed by the instrument to the user. Currently, new interfaces for musical expression are not designed to be as physically communicative as acoustic instruments. Specifically, DMIs are often void of haptic feedback and therefore lack the ability to impart important performance information to the user [1].

In the field of human–computer interaction (HCI), the formal evaluation of an input device involves a rigorous and structured analysis, often involving the use of specific methods to ensure the repeatability of a trial. The formality of the process guarantees that the findings of one researcher can be applied and developed by other researchers. In computer music, the testing of DMIs has been highlighted as being unstructured or idiosyncratic [2–5] (see Sects. 5.3.2.2, 10.3.2, 11.4, 12.3 and 12.4). However, it is arguably challenging to accurately measure and appraise the creative and effective application of technology in a creative context. These aspects of a DMI's evaluation cannot effectively be represented by quantitative techniques alone. In response to these shortcomings, we seek to gather data via both quantitative and qualitative means, as has been seen in other studies [3]. Presented within this chapter is an experiment that evaluates and compares the major components of haptic feedback. To achieve this, the feedback mechanisms of two prototype DMIs were assessed, namely the Haptic Bowl and the Non-Haptic Bowl, which were augmented to provide vibrotactile feedback [6]. The objective of the experiment was to quantify the effect of haptic feedback in the performance of pitch selection tasks; specifically, the move time and accuracy that could be achieved with different feedback types. In addition to measure the device performance, the user's perception of usability and their overall experiences within the context of the experiment were also captured and analysed.

To formally structure the experiment, a validated framework of analysis was applied [7]. This DMI evaluation framework was designed to tackle the multiparametric nature of musical interactions while also assessing the practical design features applied in the construction of a DMI. By applying a structured evaluation model, users' attitudes towards functionality, usability and user experience data while undertaking a pitch selection task were captured. For this analysis, a pitch selection task was chosen to quantitatively measure user performance and maintain objectivity in the investigative and evaluation methodologies that were later applied. Following this, structured post-task questionnaires were conducted after each stage of the experiment to elicit further information and to closely correlate quantitative with qualitative data. An empathy map for each feedback stage was then constructed to connect in-task results with post-task questioning.

In accordance with the evaluation framework, the structure of the chapter is presented as follows: each device is described and the feedback affordances they apply are reviewed; the experiment is then contextualised, stating the intentions and constraints of the study; a functionality trial is then presented that measures the move time and pitch selection accuracy of the different feedback stages; the usability and user experience data of the study are then presented; finally, the findings of the analysis and post-task data are discussed and concluded.

## **6.2 Experiment Design**

It has been observed that traditional evaluation methodologies from HCI are unsuitable for the direct evaluation of DMIs without prior contextualisation and augmentation [1]. This is mainly due to the complex coupling of action and response in musical interaction (see Sect. 2.3). These two factors operate within the tightly linked processes of a focused spatiotemporal task. Therefore, if this process is interrupted for an evaluation (e.g. for a questionnaire or thinking-aloud protocols), the participants are inevitably separated from their instantaneous thoughts and therefore from achieving their goals. Due to this, any system of analysis that is applied outside of the interaction is disconnected from the task being evaluated. Similar problems exist in other areas of study, for example in the evaluation of gaming controllers [8]. To counter this, adaptive and reflective models have been developed in HCI that concentrate on specific elements of an interaction, and these techniques have been augmented to evaluate the participants' experience in specific contexts. In the study presented, several validated HCI evaluation techniques were applied to combat the potential for task evaluation disconnect.

## *6.2.1 Functionality Testing*

To assess the functionality of the feedback elements from the Haptic and Non-Haptic Bowl devices, an experiment was devised which required participants to use the interfaces in a non-musical pitch selection task. This task was designed to generate quantitative data that could be used to accurately compare each feedback stage. From analysing the functional mechanisms of both devices, a Fitts' Law style experiment was designed.

## *6.2.2 Adapting Fitts' Law*

Fitts' Law is used in HCI to describe the relationship between movement time, distance and target size when performing rapid aimed movements (Fig. 6.1). Per

this law, the time it takes to move and point to a target of a specified width (W) and distance (D) is a logarithmic function of the spatial relative error [9]. While the logarithmic relationship may not exist beyond Windows, Icons, Menus, Pointer (WIMP) systems, the same experimental procedures can be followed to produce data for analysis in an auditory context [10, 11].

In the following experiment, we measured the time it took a participant to rapidly aim their movements towards a specified target pitch, which was constrained within a predefined frequency range. Essentially, physical distance was remapped to audio frequency range, where the start position corresponded to a point below 20 Hz and a target position that laid within a range less than 1 kHz. The target's width was predetermined as a physiological constant of 3 Hz for sinewave signals below 500 Hz, increasing by approximately 0.6% (about 10 cents) as frequency increased towards 1 kHz [12].

## *6.2.3 Context of Evaluation*

The evaluation context of the experiment was augmented to fit that of the performer/composer and designer's perspective. These stakeholders concern themselves with how a device works, how it is interacted with, and how the overall design of a system responds to interaction [13]. Considering this, the experiment was purposefully designed to objectively evaluate the performance of device feedback and not the musical performance of the participant. To maintain objectivity, a feedback focused experiment was devised and executed to quantify the device performance in pitch selection tasks. Secondly, validated post-task questionnaires were issued to quantify the usability of the device. This was achieved by employing a Single Ease-of-use Question (SEQ), Subjective Mental Effort Question (SMEQ) and NASA Task Load Index (NASA-TLX) questionnaires. Finally, interviews focusing on user experience were conducted as well as a User Experience Questionnaire (UEQ) to evaluate how the participants experienced the interaction.

Although post-task user experience questioning is problematic due to user disconnect issues, previously validated techniques were applied to accurately evaluate each feedback stage. Firstly, a preference of use question was posed to the participants to evaluate their opinion on the practical application of feedback in their own performances [14]. Secondly, the UEQ was completed to collect quantitative data about the participant's impressions of their experience [15]. This was followed by a moderately structured post-task interview formulated around specific topics. These known areas of concern in musical interactions included *learnability*, *explorability*, *feature controllability* and *timing controllability* [16]. These data were then subjected to content analyses. The content analysis topics were designed to elicit and explore critical incidents [17] that have been highlighted as problematic in the field of new instruments for musical expression.

Following the experiment, empathy mapping was applied in the context of user experience to understand and to form empathy for the end-user. This technique is typically applied to consider how a person is feeling and to understand what they are thinking better. This task was achieved by recording what the participants were thinking, feeling, doing, seeing and hearing as they were performing the task. With these data, it was possible to create a general post-experiment persona to raise issues specific to the context of the analysis. It is helpful to create *empathy maps* to reveal connections between a user's movements, their choices and the judgements they made during the task in a way that the participants may not be able to articulate posttask. Therefore, empathy mapping data were recorded during the practical stages of the functionality study to capture instantaneous information about the participants' experience without interrupting the task. Observations about what the participants said out loud, sentiments towards the device, their physical performance and how they used prior information of other devices during the experiment were recorded to validate and potentially expand upon the post-task questionnaire and interview data presented above.

## *6.2.4 Device Description: The Bowls*

For the analysis of haptic feedback in DMI interactions, prototype devices were constructed (Fig. 6.2). Each device was designed to represent a variety of feedback techniques, and several different input metaphors were initially explored. From this assortment, two devices were selected that could display the unique characteristics of haptic feedback in combination and isolation, while affording the user freedom of movement in a three-Dimensional (3D) space around the device. Specifically, the Haptic Bowl and the Non-Haptic Bowl were chosen.

**Fig. 6.2** Haptic bowl (left) and Non-Haptic bowl (centre), user for scale (right)

#### **6.2.4.1 The Haptic Bowl**

The Haptic Bowl is an isotonic, zero-order, alternative controller that was developed from a console game interface [6]. The internal mechanisms of a GameTrak1 tethered spatial position controller were removed and relocated into a more robust and aesthetically pleasing shell. The original Human Interface Device (HID) electronics was removed and replaced with an Arduino Uno SMD edition.<sup>2</sup> This HID upgrade reduced communication latencies and allowed for the development of further device functionality through the addition of auxiliary buttons and switches. The controller has very little in the way of performer movement restrictions as physical contact with the device is reduced to two tethers that connect the user via gloves. Control of the device requires the performer to visualise an area in three dimensions, with each hand tethered to the device within this space.

## **6.2.4.2 The Non-Haptic Bowl**

This device is also an isotonic, zero-order controller, (based upon PING)3 ultrasonic distance sensors and basic infrared (IR) motion capture (MOCAP) cameras, thus affording contactless interaction. The ultrasonic components are arranged as digital inputs via an Arduino Micro, and MOCAP cameras were created from modified Logitech C170 web cameras with visual light filters covering their optical sensors and internal IR filters removed. An IR LED embedded in a ring was then used to provide a tracking source for these MOCAP cameras. The constituent components are all contained within an aluminium shell, similar in size and shape as the Haptic Bowl. The use of these sensors matched the input capabilities of the Haptic Bowl,

<sup>1</sup>https://en.wikipedia.org/wiki/Gametrak (last accessed on 7 November 2017).

<sup>2</sup>https://www.arduino.cc/en/Main/ArduinoBoardUnoSMD (last accessed on 7 November 2017).

<sup>3</sup>https://www.parallax.com/product/28015 (last accessed on 7 November 2017).

providing a comparable interaction. However, due to its contactless nature, this input device has fewer movement restrictions than the Haptic Bowl. Control of the Non-Haptic Bowl also requires the performer to visualise a 3D area, with input gestures captured within a comparable space to that of the Haptic Bowl.

## *6.2.5 Device Feedback Implementation*

In addition to the user's aural, visual and proprioceptive awareness, haptic feedback components were incorporated into the devices to communicate performance data to the user. In the Haptic Bowl, additional feedback was included in the form of a strengthened constant-force spring mechanism for both tether points. The devices spring mechanisms were strengthened to further assist in hand localisation and the positioning effects this created in relation to the main body of the instrument. Furthermore, for vibrotactile feedback, the audio output from a sinewave-generating audio module was rerouted to voice-coil actuators (see Sect. 13.2) embedded in the device's gloves. The sinewave audio signal was routed via a Bluetooth receiver embedded within the Haptic Bowl. This device was then connected to the voice-coil actuators contained within each of the device's gloves [18]. Therefore, providing sinewave feedback in real time that is directly related to the audio output, as is innately delivered in acoustic musical instrument interactions. It was also possible to apply this vibrotactile feedback to the Non-Haptic Bowl via the same gloved actuators. To achieve this, the sinewave audio output was again routed through the same type of Bluetooth speaker, but in this case, the speaker was kept external from the device. The removal of the speaker from the DMI was done to highlight the disconnect of these feedback sources in existing DMI designs.

From combinations formulated around these feedback techniques, it was possible to create four feedback profiles for investigation:


Each feedback stage operated within the predefined requirements for sensory feedback as outlined in earlier research [19].

## *6.2.6 Participants*

Twelve musicians participated in the experiment. All participants were recruited from University College Cork and the surrounding community area. The participants were aged 22–36 (M = 27.25, SD = 4.64). The group consisted of 10 males and 2 females. All participants self-identified as being musicians, having been formally trained or performing regularly in the past 5 years.

## *6.2.7 Procedure*

All stages of the experiment were conducted in an acoustically treated studio space. The USB output from each Bowl device was connected to a 2012 MacBook Pro Retina. The serial input data from the devices were converted into Open Sound Control (OSC) messages in Processing4 and outputted as UDP5 information. Pure Data (Pd) then received and processed these data. Within Pd, the coordinates over the z-plane were used to create a virtual Theremin,6 with the right hand controlling the pitch, and the left hand the volume. The normal operational range of both devices was altered to fit within an effective working range of 30 cm; this range lay slightly above an average waist height of 80 cm (the average height in Ireland, as of 2007, is 170 cm and the waist-to-height ratio calculated 0.48). A footswitch was employed by the participant to indicate the start and end of each test.

After a brief demonstration, participants were given 5-min free-play to familiarise themselves with the operation of the device. Following this, subjects were then given a further five min to practice the experimental procedure. The overall total timeon-task varied between participants and experiment stages, but remained within an average range of 1.5–2 h' total. Participants were presented with each feedback type in counterbalanced order (a method for controlling order effects in repeated-measures design). For ecological validity, participants were required to wear the device-gloves throughout all experimental stages. The task consisted in listening to a specific pitch, and then seeking and selecting that target pitch with the device as quickly and as accurately as possible. The listening time required for remembering the target pitch varied between participants from only 5 to 10 s maximum. The start position for all stages was with hands resting in a neutral position at the waist. In each trial, participants used the footswitch to start and finish recording movement data. For each run of the experiment, eleven frequencies were selected in counterbalanced order across a range of 110–987.77 Hz. All frequencies in the experiment had a relative pitch value. Participants performed three runs, with a brief rest between each. The processing patch was used to capture input movement data and the time taken to perform the task; these data were then outputted as a.csv file for analysis.

<sup>4</sup>A programming environment for the visual arts: https://processing.org/ (last accessed on 26 November 2017).

<sup>5</sup>User Datagram Protocol, a protocol for network communication.

<sup>6</sup>An early electronic musical instrument named after its Russian inventor Lev Theremin, in which the pitch and volume are controlled by the position of the performer's hands relative to a pair of antennas.

After each feedback stage of the experiment, participants were asked to complete a post-task evaluation questionnaire and informal interview. All interviews followed the same guiding question:

• What were the central elements of device feedback that resulted in task success or failure?

This directorial question was then operationalised by the following:


Throughout the interview, interview-laddering<sup>7</sup> was applied to explore the subconscious motives that lead to the specific criteria being raised. A Critical Incident Technique (CIT) analysis was then applied to extrapolate upon the interview data collected. This set of procedures was used to systematically identify any behaviours that contributed to the success (positive) or failure (negative) in the specific context.

## **6.3 Results**

Functionality data were collected during the experiment so as to represent objective and quantitative measures that impartially represent the effects of feedback in audio-based exercises. Following this, the validated questionnaires and qualitative interview techniques were undertaken to gather subjective opinions from participants. Participants were not made aware of these performance data when being interviewed.

## *6.3.1 Functionality Results*

The results from the functionality evaluation can be seen in Fig. 6.3 and Table 6.1. An analysis of variance yielded no significant variations in move time for the different feedback types, with p > 0.05 for all frequencies. For the individual feedback stages, participants could target and select pitches within the predetermined target size of 3 Hz for all frequencies below and including 261.6 Hz. As expected, the accuracy of pitch selection decreased with frequency increment. Above 261.6 Hz and up to and including 523.25 Hz, the deviation from target pitch increased, but remained within the expected range. Beyond this, from 523.25 Hz up to and including 975.83 Hz, the average deviation increased further. Notably, the *no feedback* stage of the experiment exceeded the expected deviation constant of 6 Hz for this range by 3 Hz. Like move

<sup>7</sup>An interviewing technique where simple responses are probed and explored by the interviewer to discover the subconscious motives of the participant.

**Fig. 6.3** Mean move time over frequency for all feedback stages


**Table 6.1** Average deviation from target for all feedback stages

time measurements, although there were practical variations in the accuracy of target selection across all feedback stages, there was found to be no significant effect of feedback on the accuracy of frequency selection, with p > 0.05 for all feedback types.

## *6.3.2 Usability Results*

For the SEQ, the participants were given the opportunity to consider their own performance and factor this into their response. Users had to fit their rating of performance based upon the range of answers available (7 in total) and respond to their interpretation of the difficulty of the task accordingly. The post-task SEQ answers can be seen in Fig. 6.4 and Table 6.2.

For the *haptic feedback* stage, a larger portion of users (42%) found that the task was somewhat difficult for them to complete, and the perceived ease-of-use increased in difficulty for each feedback stage after this until the perception of performance decreased to a rating of very difficult (58%) for the *no feedback* stage. When verbally questioned, participants expressed that while they were fully engaged in the task, the perceived difficulty of performance using the devices was as it would be if they were performing for the first time with any new instrument. This increase in cognitive



**Table 6.2** SEQ evaluation for all feedback stages

aInter Quartile Range

load moved them to consider their performance more critically. Participants were unaware of their actual move time and accuracy scores at this point.

A Friedman Test revealed a statistically significant effect of feedback upon SEQ answers across the four different feedback stages: x2(3, n = 12) = 31.75, p < 0.001. Following this, a Wilcoxon Signed-Ranks analysis of variance was conducted to explore the impact of device feedback on SEQ answers. There was found to be a statistically significant effect of feedback on device scores. The effect size was measured from 0.34 to 0.45. Post hoc comparisons indicated that the score for the *no feedback* stage of the experiment was significantly different from the *haptic* and *force* stages after Bonferroni adjustment. There were found to be no significant differences between *haptic* and *force feedback* and the *tactile* and *no feedback* stages. This indicated that the participants' perception of task difficulty was significantly different from *no feedback* when force feedback was presented in the interaction. Furthermore, tactile feedback played no role in this perception rating.

In comparison to the SEQ, the SMEQ presented a near-continuous response choice for the participants to choose from (Fig. 6.5). Theoretically, this allowed the participants to be more precise regarding their estimation of the device's usability. The premise of this scale was to elicit an indication of the user's thoughts towards the amount of mental effort they exerted during the task. The mean value of the

**Fig. 6.5** Boxplots representing mean SMEQ answers for each unique feedback element


**Table 6.3** SMEQ evaluation for all feedback stages

SMEQ answers for each feedback type can be seen in Table 6.3. The results support the usability analysis of the SEQ; however, this scale measured the amount of effort the participants felt they invested rather than the amount of effort demanded from them.

A repeated-measures ANOVA was conducted to compare scores on the SMEQ scale. There was found to be a significant effect for feedback: F(3, 9) = 11, p = 0.002, with partial η<sup>2</sup> = 0.79. The post hoc comparisons indicated that the score for the *no feedback* stage of the experiment was significantly different from the *haptic*, *force* and *tactile* stages. There was found to be no significant difference between *haptic* and *force feedback* stages.

Following the evaluation of perceived effort, the participant's subjective workload was recorded with a paper and pencil NASA-TLX assessment questionnaire. In this, the total workload is divided into six TLX subscales, the results of which can be seen in Fig. 6.6. The first indicator in the NASA-TLX subscale required the user to signify

**Fig. 6.6** NASA-TLX subscale ratings of usability for each unique feedback element

how demanding they found the task in terms of its complexity. The observed results denote that a somewhat small amount of mental and perceptual activity was required, indicating that the task was simple to complete for all feedback stages. Next, the mean physical demand of the task was measured, showing that the participants found the task relatively easy to complete, and that a reasonable amount of physical activity was demanded from them in completion of the task. In terms of temporal demand—the time pressure felt in performing the task—the mean user rating of the experiment shows that the pace of the task was realistic and that participants were not rushed, had plenty of time to complete the task without pressure, and that the task elements were presented within a realistic time frame. In the self-evaluation of performance in the TLX questionnaire, participants indicated that they were relatively unsatisfied with their own performance.

The users' satisfaction with the success of their performance corroborates with the earlier findings of negative self-satisfaction in performance of the task. It also highlights some difficulties in the completion of the task and that a raised mental awareness was required during its execution. Notably, all feedback stages were rated equally negatively, with no significant effect of feedback. Therefore, although a negative evaluation of performance was recorded, there was no distinction between the performance of the different feedback stages as was present in the SEQ and SMEQ. In contrast to the self-evaluation of performance, participants indicated that they worked only somewhat hard mentally and physically to accomplish their level of performance. This indicated that the participants did not feel that they had worked particularly hard to reach their overall level of performance, even though an unsatisfactory evaluation of performance was measured.

Next, participants recorded that they were not irritated or stressed by the task. The TLX measured relatively low frustration levels, weighting towards a relaxed attitude during the experiment. These results indicated that although participants were relatively unsatisfied with their performance, they were not stressed or unhappy. Finally, a mean overall "raw TLX" measure of workload was calculated to represent the overall TLX rating of each feedback type. Due to time restrictions, a pairwise comparison of each dimension was not deemed necessary and thus not undertaken.

A repeated-measures ANOVA was conducted to compare scores on the different feedback stages, and although there were some noticeable variations in the mean scores for each category and feedback types, no significant effect of feedback was recorded at the p < 0.05 levels for all categories except for *effort*: (F(3, 9) = 4.22, p = 0.04, partial η<sup>2</sup> = 0.58). Post hoc testing for *effort* revealed that there was a significant difference in mean scores for perceived effort between the *no feedback* and *tactile feedback* stages of the experiment (mean difference = 8.42, p = 0.046). This indicated that participants regarded the different feedback types as equally usable across all TLX categories except for *effort*, where there was minimal difference in scores between the *tactile* and *no feedback* stages.

## *6.3.3 User Experience Results*

The final stage of the functionality analysis incorporated a post-task assessment of the users' experiences during the experiment. A pre-existing questionnaire was used to measure user experience quickly, simply and as immediately as possible. Six critical aspects of experience were captured via the UEQ questionnaire: *attractiveness*, *perspicuity*, *efficiency*, *dependability*, *stimulation* and *novelty* (Fig. 6.7). The overall internal consistency of the user experience scales was acceptable, with α = 0.88. However, poor internal consistencies for some of the individual feedback stages were observed, highlighting some disparity between participant answers. The maximum range was measured as −3 (very bad) and +3 (very good). However, maximum ratings have been previously reportedly as unlikely in user studies [15]; therefore, a more restrictive range was applied to compensate for different answer tendencies of the participants. For user experience measures on this scale, mean values between −0.8 and 0.8 are representative of a neutral evaluation of the corresponding dimension. Values greater than 0.8 represent a positive evaluation, and values below −0.8 represent a negative evaluation.

A repeated-measures ANOVA was conducted to compare UEQ scores revealing that there were statistically significant variations in user experience answers for the *efficiency*, *dependability* and *novelty* category ratings at the p < 0.05 level. However, pairwise comparisons of *novelty* with adjustments for multiple comparisons (Bonferroni) revealed no significant differences between the feedback stages. The categories of *efficiency* and *dependability* specifically relate to the user's experience

**Fig. 6.7** Boxplots representing UEQ results for each unique feedback stage

**Fig. 6.8** Boxplots representing UEQ efficiency and dependability for each unique feedback stage

of the ergonomic quality aspects that were applied in the design of the Bowl devices (Fig. 6.8). Participants evaluated their experience of device efficiency in the chosen task as being quick and organised for *haptic feedback* reducing towards a more neutral rating as feedback was reduced in the order of *force*, *tactile* and *no feedback,* respectively. Similarly, the participants' experience of dependability of the feedback stages showed the same downwards trend, with experience ratings of predictable and secure behaviour for *haptic* and *force feedback* being high and a much more neutral rating for *tactile* and *no feedback*.

From these findings, participants rated the different feedback stages relatively equally for the categories of *attractiveness*, *perspicuity*, *stimulation* and *novelty*. Post hoc comparisons with Bonferroni adjustment indicated that the mean score for *efficiency* for*force feedback* was significantly different from the *no feedback* stage. In addition, the same test revealed that there were statistically significant effects between dependability ratings for *haptic* and *force feedback* and *tactile* and *no feedback*.


**Table 6.4** Participant preference of use

This significance highlighted a perceived efficiency rating difference between the feedback stages of *force*, *tactile* and *no feedback*. These perceived differences are interesting due to the lack of difference observed in performance.

## *6.3.4 Interview Data*

Participants were asked whether they would like to use each feedback stage to perform with outside of the experiment. Participants' answers varied across the different feedback stages (Table 6.4). Most participants were pleased with their evaluation of feedback performance for each device and thought that they would use the device outside of the experiment. However, some users also indicated that they did not have an opinion about usage preference, as they would not normally use a computer interface to make music. When questioned further, users indicated that they were not particularly inspired by the experiment methodology, but suggested that if they could expand or explore the devices' parameters further they might have rated it more favourably. The estimated usage ratings for the different device feedback stages noticeably reduced from the *haptic* stage through to the *no feedback* stage (Fig. 6.9). Participants who were not accustomed to performing with computer interfaces expressed that they felt increasingly negative towards devices as feedback was reduced.

A Friedman Test revealed a statistically significant difference in device use answers across the four different feedback stages, x2(3, n = 12) = 25.05, p < 0.001. Following this, a post hoc Wilcoxon Signed-Ranks test was conducted to explore the impact of device feedback on estimated use answers. There was found to be a statistically significant difference at the p < 0.0125 levels in device scores between the *haptic* and all other feedback stages. A medium-to-large effect size was observed from 0.24 to 0.44. Post hoc comparisons indicated that the score for the *haptic* stage was significantly different from the other feedback stages at the p = 0.0125 level. There were also significant differences in results between the *no feedback* stage and *force* and *tactile feedback* stages. This demonstrates how haptic feedback can be used as a preferential feature when choosing between multiple DMIs in composition or music performance.

**Fig. 6.9** Diverging stacked bar chart for preference of use evaluation

Participants were asked open-ended questions to gauge their opinions about the different feedback stages. These questions were then expanded upon in an interview, with care taken not to bias the participants' responses. A CIT analysis was conducted based upon the participant's answers to record the users' attitudes to the different feedback types. Content analysis techniques were then applied to categorise the responses into areas of concern; these included: *personal preference*, *playability*, *comparison to other musical instruments*, *learnability*, *comparison to other DMIs*, *explorability* and *tempo*.

From the interview transcripts, coherent thoughts and single statements were identified and extracted. After redundancy checking, a total of 322 single statements were counted (M = 80.5, SD = 15.77, per feedback stage). Following this, three researchers were independently employed to iteratively classify this pool of statements as either "positive" or "negative" performance evaluations. Although this process was initially reductive, a second analysis of the data was used to develop a bottom-up categorical system of classifications to known areas of concern in musical interactions: *learnability*, *explorability*, *feature controllability* and *timing controllability* [16].

Participants were inclined to be positive about the *haptic feedback* stage of the experiment and were pleased with the amount of feedback that was delivered, see Table 6.5. It was noted that participants were more vocal about their experiences at this stage than for the *tactile* and *no feedback* stages. The CIT highlighted *personal preference* as the most reported aspects of user experience at this stage. These comments highlighted the overall enjoyment of participants when interacting with the device. However, while many comments were positive, participants highlighted some negative ergonomic aspects of the interaction as well. Comments about *playability* mainly focussed on interaction difficulties during the task. However, many remarks made in the *playability* category were positive. These demonstrated an appreciation for the increased performance information provided by haptic feedback. Participants expressed a partiality for familiar feel to the interface, which they felt increased their attention to their actions. This showed that if care was taken to provide haptic feedback in DMI designs, the end-user may gain an increased sense of awareness


**Table 6.5** Content analysis for haptic feedback

of their interaction, without involving overly complicated mechanisms or device processing power. The *comparison to other musical instruments* category produced several interesting responses in comparison to the other feedback stages. Specifically, comments that compared the device directly with acoustic instruments provided an interesting insight into the combination of force and tactile feedback. *Learnability* was seen more positively here than for the *force* and *tactile feedback* alone. These findings have been observed in other research areas, most notably in [20]. The category containing the most negative remarks was *tempo*. The comments expressed here all indicated that a tempo-based task would be very problematic to perform and positive comments indicated that it would be challenging to accomplish.

Table 6.6 shows the results of the content analysis of the *force feedback* stage of the experiment. This stage of the experiment received the same number of positive comments as the *haptic* stage; however, it also received more negative comments. As with the *haptic feedback* stage, *force feedback* received noticeably more comments than the *tactile* and *no feedback* stages of the experiment. Again, the category that contained the most comments was the *personal preference* category; however, the categories following this varied from the *haptic feedback* stage.

The *personal preference* category of the *force feedback* stage contained comments discussing the novelty of the design and how the users found it interesting to use. There were also several positive comments focussing on simplicity and accessibility of the interface. However, some comments fixated negatively on the way pitch selection was achieved and the quality of sound reproduction from the small-embedded speaker. Participants were more inclined to refer to other instruments in the *comparison to other musical instruments* category compared to the *haptic feedback* stage; however, some comments were critical of the lack of input gestures available to use. This further highlighted the restrictive nature of functionality focused experimentation. Comments in the *playability* category discussed the implication of physical requirements for playing the device, either praising its accessibility or commenting on the interface requirements for interaction. The group containing the most negative


**Table 6.6** Content analysis for force feedback

**Table 6.7** Content analysis for tactile feedback


remarks was again the *tempo* category. Comments made here referred to issues of envelope attack time, jumps in pitch and concerns about accuracy.

Table 6.7 shows the results of the content analysis of the *tactile feedback* stage. Participants were more conservative with comments, suggesting that there were not as many aspects of this feedback stage that were worthy of note. However, this may be attributable to the conservative nature of the participant pool. The categories that contained the most responses were *personal preference*, *comparison to other musical instruments* and *playability*.

The *personal preference* category contained the largest amount of participant comments. This category also contained the most positive comments. These comments mainly reflected how the participants felt about the interaction and their curiosity about tactile feedback. However, some participants viewed the interaction as unpredictable and inaccurate. Comments in the *comparison to other musical instruments* category talked about how the interactions were in comparison to the participants'


**Table 6.8** Content analysis for no feedback

own instruments and compared accuracy between the two types of instrument. The *playability* category contained the highest number of negative comments. The participants were particularly focused on their own perception of lack of accuracy and precision in their movements.

Finally, the results from the *no feedback* stage of the experiment can be seen in Table 6.8. This feedback stage yielded a high number of comments about *personal preference*, *comparison to other DMIs* and *playability* issues. The negative *personal preference* comments highlighted the participants' frustrations at the lack of feedback provided. Positive comments were directed to the novelty and fun factor of the interaction. Participants were more inclined to compare the *no feedback* stage of the experiment with other DMIs, as seen in the *comparison to other DMIs* category. Many of the comparisons were negative, focussing again on the perceived inaccuracy of their movements. Positive comments highlighted the differences to other DMI interaction types. As with the *tactile feedback* stage of the experiment, the *playability* category contained the most negative comments. These comments mainly focused on the perceived accuracy of the interaction, with a few comments about creative application.

## *6.3.5 Empathy Mapping*

Empathy mapping results are represented in Figs. 6.10, 6.11, 6.12 and 6.13 showing little deviation from observed actions during the functional task and verbal explanations of answers in the interview; this serves to further validate the analysis techniques applied.

**Fig. 6.10** Empathy mapping for Haptic feedback

**Fig. 6.11** Empathy mapping for force feedback

**Fig. 6.12** Empathy mapping for tactile feedback

**Fig. 6.13** Empathy mapping for no feedback

## **6.4 Discussion**

In the functional analysis, participants could select the specific pitches with observable increases in mean move time across the four stages of feedback. However, the statistical analysis of mean move time variance between each feedback stage presented with no significant effect for feedback. This indicated that, although there was evidence of some practical differences between feedback types, haptic feedback and its derivatives had no consistent effect upon move times in pitch selection tasks. This finding supports the argument that haptic feedback has no significant effect upon a device's performance in functional device evaluation exercises. Furthermore, the accuracy of pitch selection across the different feedback stages also varied with frequency. Mean deviation from the target frequency did so over three distinct bandwidths. For waveforms below 500 Hz, the predetermined physiological constant was maintained, with frequencies above this threshold increasing in deviation by approximately 0.6%. The mean accuracy figures for each feedback stage presented with no significant differences; however, there was again evidence of practical differences. These findings further support an argument that haptic feedback may have no significant quantitative effect upon a device's performance in auditory pitch selection exercises.

For the SEQ, it was found that when participants were given the opportunity to evaluate their own performance, they rated themselves differently for each feedback type. Participants evaluated the difficulty of the task with *tactile* and *no feedback* as being more challenging than with *haptic* and *force feedback*. There was no significant difference between the *haptic* and *force feedback* stages or the *tactile* and *no feedback* stages, indicating that tactile feedback had no effect upon the participant's perception of ease-of-use. However, from these observations, force feedback can be seen as having some positive effect. Although the quantitative measures of performance indicated that there was no significant difference in move time and accuracy, participants were inclined to be more self-critical of their performance than necessary when feedback was altered or removed. Many participants indicated that, although they found the task difficult across all stages, their level of engagement varied, as it would if they were performing for the first time with any new acoustic instrument.

The SMEQ further supported these findings, with ratings showing that some amount of effort to a fair amount of effort was required to perform the exercises. However, the SMEQ presented a different focus than that of the SEQ, as it measured the perceived amount of mental effort applied during the task. The results showed that the amount of mental effort required increased as feedback was removed, although the actual quantified performance of the different feedback stages did not significantly differ. These differences were significant between the *haptic* and *force feedback* stages and the *no feedback* stage. *Tactile feedback* did not differ significantly from any other stage. Furthermore, the perception of increased mental effort was also indicated as being a significant effector during the user experience analysis. From analysing the functional data and comparing them to the participant's perception of mental effort and ease-of-use, it was observed that force feedback was the most influential feedback type, with no significant effect observed for tactile feedback. However, with the addition of tactile feedback to force feedback, there were also no detrimental effects on the user's performance ratings.

The overall raw usability testing revealed no significant effect of feedback across all feedback stages; however, the data collected did reveal some interesting results. For example, the self-measure of performance on the NASA-TLX scale was found to be reasonably poor for all feedback types. This indicated that participants were equally negative about how successful and satisfied they were with their performance across for all feedback types. The results also indicated that haptic feedback and its constituent parts each played some part in the reduction of participants' perception of mental demand. The combination of TLX, SEQ and SMEQ usability ratings indicate that a general level of dissatisfaction with performance for each feedback type was noted.

The UEQ data from the study highlighted a significant difference between the users' experience of efficiency and dependability across all feedback stages. For efficiency ratings, significant differences were observed between *haptic* and *force feedback* and *tactile* and *no feedback* ratings. This denoted that the evaluation of the participants' experience of work performed to total effort expended was not affected by tactile feedback, but by force feedback alone. Similarly, the participants' appraisal of dependability displayed the same evaluation characteristics. The participants' experience and assessment of device reliability showed that they felt that the *tactile* and *no feedback* stages were less reliable than the *haptic* and *force* stages, regardless of there being no measurable effect of feedback in accuracy and move time.

Subsequently, critical incidents for each feedback stage were assessed. Overall, the CIT analysis revealed some interesting trends. The most obvious of these was the decrease in positive comments and the increase in negative comments made as feedback was removed from the interaction. Additionally, participants were particularly more vocal about their personal preferences when interacting with each feedback stage. This trend highlighted the importance of performer individuality and prior experiences when designing, building and using a DMI device with feedback. This would imply the need for a more explorative investigation methodology in the evaluation of experience. This aspect could be further expanded upon in user case studies and involve the further consideration of creative applications in its analysis.

With the specific matching and categorisation of the devices and the quantitative and qualitative data recorded during functionality testing, the results of the experiment showed that the effect of haptic feedback and its derivatives could be measured in the operation of a DMI, with accurate data measures. These findings denoted interesting results for the different types of feedback displayed to the user, and although there was no direct affect upon the quantitative performance of the DMI, feedback may still be revealed to have some positive influence upon the user's perceptual experience when applying them in note-level-control metaphors, musical exercises, and explorative or creative contexts.

The discipline of HCI has a wide range of evaluation frameworks for the appraisal of digital technology as applied to simple, multiparametric tasks. This includes evaluation techniques that are designed to discover issues that arise in unique applications of technology, such as the effects of haptics in DMI design. For the appraisal of complex devices, HCI evaluation techniques can be incorporated in the evaluation of usability and user experience. In addition to this, the subject of human computing (or human-centred computing) can also be used to evaluate the user's intentions and motivations in the application of technology in creative contexts. As has been presented here, an appraisal of function, as a task-focused approach, presents metrics that are easy to measure and quantify. However, in the creation of music, the application of technology relies upon the user's previous training and experiences to accurately express the musicians' inner thoughts and intentions.

It is therefore proposed that, although DMIs require functional testing to highlight potential usability issues, a comprehensive analysis should also include the evaluation of real-world situations to accurately capture and evaluate all aspects of an interaction. Thus, to expand our investigation of haptics into the real world, a music-focused analysis should also be undertaken. This idea emphasises the "third paradigm" concept, which includes the gathering of information relating to culture, emotion and previous experience. Our results show task-focussed evaluations are indeed a necessary precursor to experience-focussed assessment. However, task-focussed evaluations, when carried out in isolation, do not present sufficient information about the user or device in real-world applications of such technology.

Interaction information pertaining to acoustic musical instrument design already exists; therefore, data can be measured and used in DMI interaction design to provide a sense of realism and embodiment to virtual or augmented instruments or expanded upon to fit new design types [21]. Many digital musicians are recognised for their creativity, innovation and adaptation in the design and construction of DMIs; however, these digital instruments are often still devoid of haptic feedback. It is possible to reconstruct the operating principles of acoustic instruments and apply them to DMIs, as is seen in augmented instruments and DMIs that replicate the playing style of an acoustic instrument. For a performer, however, the emptiness of assignable "button bashing" may be seen as a negative characteristic. DMIs offer freedoms to musicians that are near endless, but digital music performers often also play conventional instruments, highlighting the need to experience the creation of music with all senses engaged.

If multimodal collocations are possible within DMI design, it should also be possible to simulate the haptic experience of an acoustic performance. Sound can be created electronically with the freedoms afforded through digital sound generation and with the combined information of the interaction response being fed back with comparable meaning as an acoustic instrument. Sound can be digitally created and manipulated by the artist, and a deeper sense of craft can potentially be realised. Computer musicians need to be able to experience consistency, adaptability, musicality and touch-related sensations that are induced by touch to experience the physiological and psychological occurrences outlined within each of the research conclusions presented here.

## **6.5 Conclusions**

In this chapter, it has been seen that the addition of haptics to DMI feedback archetypes enhances the user experience, but does not appear to impact on the effectiveness (move time) or accuracy of the functional elements of DMIs. Additionally, from the analysis of feedback in auditory interactions, it has been demonstrated how a HCI-informed framework can be applied in the evaluation of DMI design. Specifically, it was observed how a device's analysis can be informed by HCI techniques that are applied in the evaluation of general computing and computing for unique or creative applications. Regarding the experimental results presented here, the functional capacity of haptic, force, tactile and no feedback afforded to users in tasks that require the selection of specific frequencies was quantified and evaluated. The accumulation of differences observed within this analysis revealed influential factors of information feedback on the user's experiences in functional application contexts.

From the data gathered, DMI feedback appeared to be influential on several context dependent levels. In the study, there was found to be no significant effect of feedback upon the quantifiable performance capacities of the tested feedback stages. However, when questioning the participants further, there were discovered to be important inequalities in the perception of usability and experience when completing the task. Within these areas, the musician's perception of performance was found to be more favourable with the presence of both tactile and force feedback. Therefore, it can be concluded from this experiment that haptic feedback has some positive effect upon many perceptual experiences in the application of DMI technology and should be further investigated in the field.

It is expected that the study of interactions between performers and digital instruments in a variety of contexts will continue to be of research interest. Research on digital musical instruments and interfaces for musical expression will continue to explore the role of haptics, incorporating user experience and the frameworks that are constructed to quantify the relationship between musical performers and new musical instruments. The complexities of these relationships are further complicated by the skills of musicians and are far greater and more meaningful than a physically stimulating interaction.

It has been shown in this work that digital musical instrument design and evaluation methodologies can be applied in the study of interactions between musicians and instrument. However, it is suggested that emergent DMI systems require further measures for an accurate appraisal of the user's experience when applying the device in a musical context [22]. In a traditional HCI analysis, a device is evaluated in a specific context and the evaluation methods are expert-based heuristic evaluations or user-based experimental evaluations. Only by determining context is it possible to interpret correctly the data gathered. Therefore, it is suggested that DMI-specific functionality, usability and user experience evaluation methods should be developed.

The work presented has only begun to explore the possibilities of haptic feedback in future DMI designs. The experiment endeavoured to present evidence of some influence that haptic feedback has on a user's perception of functionality, usability and user experience. Beyond this, future research goals should include long-term studies, and the development of tools to assist in the creation of DMI designs, to allow designers experiment with different gestural interface models. Within this space, composers, performers and DMI designers will be able to explore the affordances of technologies in the creation of new instruments for musical expression.

## **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Chapter 7 Auditory-Tactile Experience of Music**

**Sebastian Merchel and M. Ercan Altinsoy**

**Abstract** We listen to music not only with our ears. The whole body is present in a concert hall, during a rock event, or while enjoying music reproduction at home. This chapter discusses the influence of audio-induced vibrations at the skin on musical experience. To this end, sound and body vibrations were controlled separately in several psychophysical experiments. The multimodal perception of the resulting concert quality is evaluated, and the effect of frequency, intensity, and temporal variation of the vibration signal is discussed. It is shown that vibrations play a significant role in the perception of music. Amplifying certain vibrations in a concert venue or music reproduction system can improve the music experience. Knowledge about the psychophysical similarities and differences of the auditory and tactile modality help to develop perceptually optimized algorithms to generate music-related vibrations. These vibrations can be reproduced, e.g., using electrodynamic exciters mounted to the floor or seat. It is discussed that frequency shifting and intensity compression are important approaches for vibration generation.

## **7.1 Introduction**

Several chapters in this book discuss the influence of haptic cues provided by instruments to musicians. Usually, the forces and vibrations at the skin are directly excited by a physical contact with the instrument. However, the radiated sound itself can stimulate the surface of the human body too. This is true for musicians and music listeners alike. The main hypothesis to be evaluated in this chapter is that vibrations at the listeners skin might be important for the perception of music. If the vibratory component is missing, the perceived quality might change, e.g., for a concert experience. From another perspective, the perceived quality of a concert hall or a

S. Merchel (B) · M. E. Altinsoy

© The Author(s) 2018 S. Papetti and C. Saitis (eds.), *Musical Haptics*, Springer Series on Touch and Haptic Systems, https://doi.org/10.1007/978-3-319-58316-7\_7

Institut für Akustik und Sprachkommunikation, Technische Universität Dresden, Helmholtzstr. 18, 01069 Dresden, Germany e-mail: sebastian.merchel@tu-dresden.de

M. E. Altinsoy e-mail: ercan.altinsoy@tu-dresden.de

conventional audio reproduction system might be improved or impaired by adding vibrations. These vibrations can be excited directly via the air or via the surfaces that are in contact with the listener. This study focuses on seat vibrations, such as those that can be perceived in a classical chamber concert hall. Measurements in an exemplary concert hall and a church confirmed the existence of seat vibrations during real music performances [27]. If a kettledrum is hit or the organ plays a tone, the ground and chair vibrate. The vibratory intensity and frequency spectra are dependent on various factors, e.g., room modes or construction parameters of the floor. Nevertheless, in many cases, the concert listener may not recognize the vibrations as a separate feature because the tactile percept is integrated with the other senses (e.g., vision and hearing) into one multimodal percept. Even if the listener is unaware of vibrations, they can have an influence on recognizable features of the concert experience, e.g., the listener's presence or envelopment—parameters that are of vital importance in determining the quality of concert halls [8].

Unfortunately, there is no vibration channel in conventional music recordings. Therefore, it would be advantageous if a vibration signal could be generated using the information stored in existing audio channels. This approach might be reasonable because the correlation between sound and vibration is naturally strong in everyday situations.

Two pilot experiments were conducted and described by Merchel et al. [24, 25], who investigated the influence of seat vibrations on the overall quality of the reproduction of concert DVDs. Low-pass-filtered audio signals were used for vibration generation through a shaker mounted to a seat. In many cases, participants preferred when vibrations were present, instead reporting that something was missing if seat vibration was turned off. However, different complaints were reported: It was stated that the high-frequency vibrations were sometimes prickling and therefore unpleasant; several participants reported that some vibrations were too strong and that others were too weak or completely missing; it was also noted that the sound generated by the vibration chair at higher frequencies (indeed, a side-effect) was disturbing. In the aforementioned experiments, a precisely calibrated vibration actuator was applied that was capable of reproducing frequencies from 10 to 200 Hz and higher. In practical applications, smaller and less expensive vibration actuators would be beneficial, however these shakers are typically limited to a small frequency range around a resonance frequency or they are not powerful enough for the present application.

Our work aims to broaden the understanding of the coupled perception of music and vibration by addressing the following questions: Can vibration-generation algorithms be obtained that result in an improved *overall quality of the concert experience* compared with reproduction without vibration? Which algorithms are beneficial in terms of *silent and simple* vibration reproduction? In this chapter, algorithms are described that were developed and evaluated to improve music-driven vibration generation, taking into account the above questions and complaints. The content is based on several papers [3, 27, 28] and the dissertation of the first author with the title 'Auditory-Tactile Music Perception' [23] with kind permission from Shaker Verlag.

## **7.2 Experimental Design**

In this section, the applied music stimuli, the experimental setup, participants, and procedure are described. Different vibration-generation approaches will be discussed and evaluated in the following section.

## *7.2.1 Stimuli*

To represent typical concert situations for both classical and modern music, four sequences were selected from music DVDs [7, 21, 45, 46] that included significant low-frequency content. A stimulus duration of approximately 1.5 min was chosen to ensure that the participants had sufficient time to become familiar with it before providing quality judgments. The following sequences were selected:


The first piece, Toccata in D minor, is a well-known organ work that is referred to as BACH. A spectrogram of the first 60 s is plotted in Fig. 7.1a, which shows a rising and falling succession of notes covering a broad frequency range, as well as steady-state tones with a rich overtone spectrum that dominate the composition. Strong vibrations would be expected in a church for this piece of music [27]. The second sequence, Dies Irae, abbreviated as VERDI, is a dramatic composition for double choir and orchestra. A spectrogram is plotted in Fig. 7.1b: Impulsive fortissimo sections with a concert bass drum, kettledrum, and tutti orchestra alternate quickly with sections dominated by the choir, bowed instruments, and brass winds. The sequence is characterized by strong transients. The third stimulus, Slavonic Dance No. 2 in E minor, is referred to as DVORAK, and is a calm orchestral piece, dominated by bowed and plucked strings. Contrabasses and cellos continuously generate low frequencies at a low level (see spectrogram in Fig. 7.2a). The fourth sequence, Sing Along, is a typical pop music example performed by the Blue Man Group, which is further shortened to BMG. The sequence is characterized by the heavy use of drums and percussion. These instruments generate transient content at low frequencies, which can be seen in the corresponding spectrogram in Fig. 7.2b. Additionally, a bass line can be easily identified.

To generate a vibration signal from these sequences, the sum was calculated of the low-frequency effects (LFE) channel and the three respective frontal channels. No low-frequency content was contained in the surround sound channels in any situation. Pure Data (Pd) was used for this purpose. During the process, several signal processing parameters were varied: A detailed description of the different approaches is presented in Sect. 7.3.

**Fig. 7.1** Spectrograms of the mono sums for 60 s from the BACH and VERDI sequences. The short-time Fourier transforms (STFTs) were calculated with 8192 samples using 50% overlapping Hann windows

**Fig. 7.2** Spectrograms of the mono sums for 60 s from the DVORAK and BMG sequences. The short-time Fourier transforms (STFTs) were calculated with 8192 samples using 50% overlapping

Hann windows

## *7.2.2 Synchronization*

For a good multisensory concert experience, it is recommended that input from all sensory systems should be integrated into one unified perception. Therefore, the delay between different sensory inputs is an important factor. Many published studies have focused on the perception of synchrony between modalities, mostly related to audiovisual delay (e.g., [12, 38]). Few studies have focused on the temporal aspects of acoustical and vibratory stimuli. These studies have differed in the types of reproduced vibration (vibrations at the hand, forearm, or seat vibration), types of stimuli (sinusoidal bursts, pulses, noise, instrumental tones, or instrumental sequences), and experimental procedures (time-order judgments or the detection of asynchrony). However, some general conclusions can be drawn.

It was reported that audio delays are more difficult to detect than audio advances. Hirsh and Sherrick [17] found that a sound must be delayed 25 ms against handtransmitted sinusoidal bursts to detect that the vibration preceded the sound. However, vibrations had to be delayed only 12 ms to detect asynchrony. A similar asymmetry was observed by Altinsoy [1] using broadband noise bursts reproduced via headphones and broadband vibration bursts at the fingertip: Stimuli with audio delays of approximately 50 to −25 ms were judged to be synchronous, and the *point of subjective simultaneity* (PSS) shifted toward an audio delay of approximately 7 ms. Detection thresholds for auditory-tactile asynchrony appear to also depend on the type of stimulus. In an experiment reproducing broadband noise and sinusoidal seat vibrations, audio delays from 63 to −47 ms were found to be synchronous [2]. Using the same setup, audio delays from 79 to −58 ms were judged to be synchronous regarding sound and seat vibrations from a car passing a bump [2].

For musical tones, the PSS appears to vary considerably for instruments with different attack or decay times. For example, PSS values as high as −135 ms for pipe organ or −29 ms for bowed cello have been reported [9, 43]. In contrast, PSS values as low as −2 ms for kick drum or −7 ms for piano tones were obtained [43]. Similarly, low PSS values were obtained using impact events reproduced via a vibration platform [22].

Thus, auditory-tactile asynchrony detection appears to depend on the reproduced signal. Impulsive content is clearly more prone to delay between modalities. Because music often contains transients, the delay between sound and vibration in this study was set to 0 ms. However, for a real-time implementation of audio-generated vibration reproduction, a slight delay appears to be tolerable or even advantageous in some cases. Additionally, the existence of perceptual adaptation mechanisms—which can widen the temporal window for auditory-tactile integration after prolonged exposure to asynchronous stimuli—has been demonstrated [37].

## *7.2.3 Setup*

A reproduction system was developed that is capable of separately generating seat vibrations and sound. A surround setup was used, according to ITU-R BS.775-1 [18], with five Genelec 8040A loudspeakers and a Genelec 7060B subwoofer. The system was equalized to a flat frequency response at the listener position. To place the participant in a standard multimedia reproduction context, an accompanying movie from the DVD was projected onto a silver screen. The video sequence showed the stage, conductor, or individual instrumentalists while playing.

Vibrations were reproduced using a self-made seat based on an RFT Messelektronik Type 11076 electrodynamic shaker connected to a flat, hard wooden board (46 cm × 46 cm). Seat vibrations were generated vertically, as shown in Fig. 7.3.

The participants were asked to sit on the vibration seat, with both feet flat on the ground. If necessary, wooden plates were placed beneath the participant's feet to adjust for different lengths of legs. The transfer characteristic of the vibrating chair

**Fig. 7.4** Body-related transfer functions measured at the seat surface of the vibration chair, with and without compensation plotted with 1/24th octave intensity averaging

(relation between acceleration at the seat surface and input voltage) was strongly dependent on the individual person. This phenomenon is referred to as the bodyrelated transfer function (BRTF). Differences of up to approximately 10 dB have been measured for different participants [5]. Considering the just-noticeable difference in thresholds for vertical seat vibrations, which is approximately 1 dB [6, 13, 36], the individual BRTFs should be compensated for during perceptional investigations. The BRTF of each participant was individually monitored and equalized during all experiments. Participants were instructed not to change their sitting posture after calibration until the end of the experiment. The transfer functions were measured using a vibration pad (B&K Type 4515B) and a Sinus Harmonie Quadro measuring board, and they were compensated for by means of inverse filtering in MATLAB. This procedure resulted in a flat frequency response over a broad frequency range (±2 dB from 10 to 1000 Hz). An exemplary BRTF, with and without individual compensation, is shown in Fig. 7.4.

## *7.2.4 Participants*

Twenty participants voluntarily participated in this experiment (14 male and six female). Most of them were students between 20 and 55 years old (mean 24 years) and between 58 and 115 kg (mean 75 kg). All of the participants stated that they had no known hearing or spine damage. The average number of self-reported concert visits per year was nine, and ranged from one to approximately 100. Two participants were members of bands. The preferred music styles varied, ranging from rock and pop to classical and jazz. Fifteen participants had not been involved in music-related experiments before, whereas five had already participated in two similar pilot experiments [24, 25].

## *7.2.5 Procedure*

The concert recordings were played back to each participant using the audio setup described above, while vibrations were reproduced using the vibration chair. The vibration intensities were initially adjusted so that the peak acceleration levels reached approximately 100 dB dB (re 10−<sup>6</sup> m*/*s 2), which were clearly perceptible. However, perception thresholds can vary heavily between participants [32]; therefore, each participant was asked to adjust the vibration amplitude to the preferred level. This adjustment was typically performed within the first 5–10 s of a sequence. Subsequently, the participant had to judge the *overall quality of the concert experience* using a quasi-continuous scale. Verbal anchor points ranging from bad to excellent were added, similar to the method described in ITU-T P.800 [19]. Figure 7.5 presents the rating scale that was used.

To prevent dissatisfaction, the participants could interrupt the current stimulus as soon as they were confident with their judgment. The required time varied from 30 s to typically no more than 60 s. After rating the overall quality, the participants were encouraged to briefly formulate reasons for their judgments.

Each participant was asked to listen to 84 completely randomized stimuli, 21 for each music sequence. The stimuli were divided into blocks of eight. After each block, the participant had the opportunity to relax before continuing with the experiment. Typically, it took approximately 35 min to complete three to four blocks. After 45 min at most, the experimental session was interrupted and was continued on the next day (and the next, if necessary). Thus, two to three sessions were required for each participant to complete the experiment.

Before starting the experiment, the participants had to undergo training with three stimuli to become familiar with the task and stimulus variations. The stimuli consisted in the first 90 s from BMG using three very different vibration-generation approaches. This training was repeated before each subsession.

MATLAB was used to control the entire experimental procedure (multimodal playback, randomization of stimuli, measurement and calibration of individual BRTFs, guided user interface, and data collection).

## **7.3 Vibration Generation: Approaches and Results**

Five different approaches to generating vibration stimuli from the audio signal are described in this section. The first four approaches were implemented to modify mainly the frequency content of the signal. The main target was to reduce higher frequencies in order to eliminate tingling sensations and to avoid high-frequency sound radiation. In Sect. 7.3.1 the effect of simple low-pass filtering is evaluated. Reduction of the vibration signal to the fundamental frequency is discussed in Sect. 7.3.2. A frequency shifting algorithm is applied in Sect. 7.3.3, and substitution with artificial vibration signals is discussed in Sect. 7.3.4. In contrast to these frequency-domain algorithms, the last approach (described in Sect. 7.3.5) targets the dynamic range, thus affecting the perceived intensity of the vibration signal.

## *7.3.1 Low-Pass Filtering*

The simplest approach would be to route the sound (sum of the three frontal channels and LFE channel) directly to the vibration actuator. With some deviations, this process would correspond to the approximately linear transfer functions between sound pressure and vibration acceleration measured in real concert venues [27]. However, participants typically chose higher vibration levels in the laboratory, which resulted in significant sound generation from the actuator, especially in the high-frequency range. To address this, the signal was low-pass-filtered using a steep 10th-order Butterworth filter with cutoff frequency set to either 100 or 200 Hz, as illustrated in Fig. 7.6. However, the spurious sound produced by the vibration system could not be completely suppressed. The resulting multimodal sequences were reproduced and evaluated in the manner described above.

For the statistical analysis, the individual quality ratings were interpreted as numbers on a linear scale from 0 to 100, respectively corresponding to 'bad' and 'excellent.' The data were checked for a sufficiently normal distribution with the Kolmogorov–Smirnov test (KS test). A two-factor repeated-measures ANOVA was performed using the SPSS statistical software,<sup>1</sup> which also checks for the homogeneity of variances. The two factors were the played music *sequence* and the applied *treatment*. Averaged results (20 participants) for the overall quality evaluation are plotted in Fig. 7.7 as the mean and 95% confidence intervals. The quality ratings for the concert reproduction *without vibration* are shown on the left.

Reproduction with vibration was judged to be better than reproduction without vibration. Post hoc pairwise comparisons confirmed that both low-pass treatments were judged to be better than the reference condition at a highly significant level (p *<* 0.01), both with an average difference of 27 scale units, using Bonferroni correction for multiple testing. This finding corresponds to approximately one unit on the fivepoint scale shown in Fig. 7.5. The effect seems to be strongest for the BMG pop

<sup>1</sup>https://en.wikipedia.org/wiki/SPSS. Last accessed on Nov 10, 2017.

**Fig. 7.6** Signal processing chain to generate vibration signals from the audio sum. The signal was filtered with a variable low-pass filter, and the BRTF of the vibration chair was compensated individually

music sequence; however, no significant effects for differences between sequences or interactions between sequences and treatments are observed.

Using the 200 Hz cutoff frequency, the participants occasionally reported tingling sensations on the buttocks or thighs, which only few of them liked. This finding could explain the slightly larger confidence intervals for this treatment.

The positive effect of reproducing vibrations generated by simple low-pass filtering and the negligible difference between the low-pass frequencies of 100 and 200 Hz is in agreement with earlier results [25].

## *7.3.2 Reduction to Fundamental Frequency*

In the previous section, low-pass-filtered vibrations were found to be effective for multimodal concert reproduction. However, especially for the low-pass 200 Hz condition, some spurious sound was generated by the vibration system. This fact is particularly critical if the audio signal is reproduced for one person via headphones, as a second person in the room would be quite disturbed by only hearing the sound generated by the shaker. An attempt was undertaken to further reduce such undesired sound. This goal could be accomplished, e.g., by insulating the vibrating surfaces as much as possible. Because good insulation is difficult to achieve in our case, one effective approach would be to reduce the vibration signal to the fundamental spectral component contained in the signal.

A typical tone generated by an instrument consists of a strong fundamental frequency and several higher-frequency harmonics. If different frequencies are presented simultaneously, strong masking effects toward higher frequencies can be observed in the tactile domain [14, 41]. It can be assumed that the fundamental component considerably masks higher frequencies. Therefore, it might be possible to remove the harmonics completely in the vibration-generation process without noticeable effects. This approach is illustrated in Fig. 7.8. The fundamentals below 200 Hz of the summed audio signals were tracked using the Fiddle algorithm [39] in Pd, which detects spectral peaks. The cutoff frequency of a first-order low-pass filter was then adaptively adjusted to the lowest frequency peak (i.e., the fundamental). If no fundamental was detected, the low-pass filter was set to 100 Hz to preserve broadband impulsive events.

The results from the evaluation of the resulting concert reproduction are plotted in Fig. 7.9. The statistical analysis was executed in the same manner as in the previous section. Again, the overall quality of the concert experience improved when

**Fig. 7.8** Signal processing chain to generate vibration signals from the audio sum. The fundamental below 200 Hz was tracked, and an adaptive low-pass filter was adjusted to this frequency to suppress all harmonics. If no fundamental was detected, the low-pass filter was set to 100 Hz

vibrations were added (very significant, p *<* 0.01). At the same time, the generation of high-frequency components could be reduced, except for conditions in which the fundamental frequency approached 200 Hz, e.g., in the VERDI sequence (see Fig. 7.1b). For VERDI and DVORAK, some participants again reported tingling sensations. For BMG and DVORAK, the participants reported that it was difficult to adjust the vibration magnitude because the vibration intensity varied unexpectedly.

The average difference in perceived quality with and without vibrations was 26 scale units. Interestingly, the differences between sequences increased. The strongest effect was observed for the BMG sequence compared with the other sequences (significant interaction between treatment and sequence, p *<* 0.05). The spectrogram in Fig. 7.2b reveals that for the BMG sequence, the fundamentals always lay below 100 Hz and the first harmonic almost always lays above 100 Hz. Therefore, the fundamental filtering, as implemented here, almost corresponded to the low-pass-filtering condition, with a cutoff at 100 Hz. As expected, the resulting overall quality was judged to be similar in both cases (no significant difference; compare with Fig. 7.7).

In addition, Fig. 7.2b reveals that the first harmonic of the electric bass is slightly stronger than the fundamental. However, the intensity balance between fundamentals and harmonics is constant over time, resulting in a good match between sound and vibration. This relationship is not the case for the BACH sequence, plotted in Fig. 7.1a. The intensity of the lowest frequency component is high within the first 10 s and then suddenly weakens, whereas the intensities of higher frequencies increase simultaneously. If only the lowest frequency is reproduced as a vibration, this change in balance between frequencies might result in a mismatch between auditory and tactile perception, which would explain the poor-quality ratings for the BACH sequence using the fundamental frequency approach.

With increasing loudness, the tone color of many instruments is characterized by strong harmonics in the frequency spectrum [34]. However, the fundamental does not necessarily need to be the most intense component or can be completely missing. However, the auditory system still integrates all harmonics into one tone, in which all partials contribute to the overall intensity. In addition, different simultaneous tones can be played with different intensities depending on the composition. Therefore, a more complex approach could be beneficial. The lowest pitch could be estimated and used to generate the vibration. However, the intensity of the vibration should still depend on the overall loudness within a specific frequency range. In this manner, a good match between both modalities might be achieved. However, the processing is complex and could require greater computing capacity. Better matching the intensities appears to be a crucial factor and will be further evaluated in Sect. 7.3.5.

## *7.3.3 Octave Shift*

Another approach would be to shift down the frequency spectrum of the vibration signal. In this manner, the spurious high-frequency sound could be further reduced and the tingling sensation eliminated.

**Fig. 7.10** Distribution of crossmodal frequency-matched seat vibrations to acoustical tones with various frequencies *f* , according to Altinsoy and Merchel [4]

The frequency resolution of the tactile sense is considerably worse than that of audition [31]; therefore, it might be acceptable to strongly compress vibration signals in the frequency domain while still preserving perceptual integration with the respective sound. Earlier experiments have been conducted to test whether participants can match the frequencies of sinusoidal tones and vibrations presented through a seat [4]. The results are summarized in Fig. 7.10. The participants were able to match the frequencies of both modalities with some tolerance. In most cases, the participants also judged the lower octave of the auditory frequency to be suitable as a vibration frequency. Therefore, the decision was made to shift all the frequencies down one octave, i.e., dividing their original values by two. This shift corresponds to compression in the frequency range, with stronger compression toward higher frequencies. As shown in Fig. 7.11, before pitch-shifting the original summed audio signal was prefiltered via one of the methods described above (i.e., low-pass filtering or reduction to fundamental frequency). Pitch-shifting was performed in Pd using a granular synthesis approach: The signal was cut into grains of 1000 samples, which were slowed by half and summed again using overlapping Hann windows. Using this method, some high-frequency artifacts occurred, which were subsequently filtered out using an additional low-pass filter set at 100 Hz. The resulting low-pass-shifted vibration signals were evaluated as described above. Results are plotted in Fig. 7.12. Again, the statistical analysis was performed using ANOVA after testing the preconditions.

For the BACH sequence, shifting the lowest fundamental even farther down resulted in generally poor-quality ratings. The occasionally weak fundamental components in this sequence caused crossmodal intensity mismatch between vibration

**Fig. 7.11** Signal processing chain to generate vibration signals from the audio sum. Compression was applied in the frequency range by shifting all of the frequencies down one octave using granular synthesis. To suppress high-frequency artifacts, a 100 Hz low-pass filter was subsequently inserted

**Fig. 7.12** Mean overall quality evaluation for no-vibration and various octave-shift vibrationgeneration approaches, plotted with 95% confidence intervals

and sound, which was perceived as louder. However, the perceived quality increases with the bandwidth of the signal, i.e., when using pre-filtering with higher cutoff frequency, most likely due to a better intensity match between modalities.

The quality scores for the BMG sequence depend much less on the initial filtering. As discussed before, the difference between the 'fundamental' condition and the 'low-pass 100 Hz' condition are small. By octave-shifting the signals, the character of the vibration changed. Some participants described the vibrations as 'wavy' or 'bumpy' rather than as 'humming,' as they had previously done. However, many participants liked the varied vibration character, and the averaged quality ratings did not change significantly compared with Figs. 7.7 and 7.9. No further improvement was found for broader bandwidth of the pre-filtered signal, for the reasons already discussed in the previous section.

Results were significantly different for the DVORAK and VERDI sequences. In Sect. 7.3.1, no preference for one of the two low-pass conditions was observed. However, when these sequences are additionally shifted in frequency, an increase in quality for the 200 Hz low-pass treatment is found, as shown in Fig. 7.12. This could be explained by considering the periods during which the lowest frequency component is greater than 100 Hz (e.g., VERDI second 10–17). By octave-shifting these components while retaining their acceleration levels, they become perceptually more intense due to the decreasing equal-intensity contours for seat vibrations [30]. In addition, the vibrations were reported to cause less tingling. The same result held true for octave-shifting the fundamental.

The dependence of the quality scores on the music sequence and the filtering approach was confirmed statistically by the very significant (p *<* 0.01) effects for the factor sequence, the factor treatment, and the interaction of both. On average, all of the treatment conditions were judged to be better than without vibrations on a very significant level (p *<* 0.01). No statistically significant differences between the 'fundamental' and the 'low-pass 100 Hz' conditions were observed. However, the 'low-pass 200 Hz' condition was judged to be slightly but significantly better (p *<* 0.05) than the 'fundamental' (averaged difference = 11) and the 'low-pass 100 Hz' (averaged difference = 9) treatments with octave shifting. As explained above, these main effects must be interpreted in the context of the differences between sequences.

It can be concluded that octave-shifted vibrations appeared to be integrable with the respective sound in many cases. The best-quality scores were achieved, independent of the sequence used, by applying a higher low-pass frequency, e.g., 200 Hz.

## *7.3.4 Substitute Signals*

It was hypothesized in the previous section that the variance in the vibration character that resulted from the frequency shift would not negatively influence the quality scores. Thus, it might be possible to compress the frequency range even more. This approach was evaluated using several substitute signals and is discussed in this section. Figure 7.13 presents the signal processing chain. A signal generator was implemented in Pd to produce continuous sinusoidal tones at 20, 40, 80, and 160 Hz. The frequencies were selected to span a broad frequency range and to be clearly distinguishable considering the just-noticeable differences (JNDs) for seat vibrations [31]. Additionally, a condition was included using white Gaussian noise (WGN) low-pass-filtered at 100 Hz. These substitute signals were then multiplied with the amplitude envelope of the original low-pass-filtered signal to retain its timing information. An envelope follower was implemented, which calculated the RMS amplitude of the input signal using successive analysis windows. Hann windows were applied of size equal to 1024 samples, corresponding to approximately 21 ms, to avoid smearing the impulsive content. The period for successive analysis was half of the window size.

**Fig. 7.13** Signal processing chain to generate vibration signals from the audio sum. The envelope of the low-pass-filtered signal was extracted and multiplied with substitute signals, such as sinusoids at 20, 40, 80, and 160 Hz or white noise

**Fig. 7.14** Mean overall quality evaluation for no-vibration and various substitute vibrationgeneration approaches, plotted with 95% confidence intervals

The quality scores are presented in Fig. 7.14. An ANOVA was applied for the statistical analysis. All of the substitute vibrations, except for the 20 Hz condition, were judged to be better than reproduction without vibration at a highly significant level (p *<* 0.01). The average differences, compared with the no-vibration condition, were between 29 scale units for the 40 Hz vibration and 18 scale units for WGN and the 160 Hz vibration. There was no significant difference between the 20 Hz vibration and the no-vibration condition. The participants indicated that the 20 Hz vibration was too low in frequency and did not fit with the audio content. In contrast, 40 and 80 Hz appeared to fit well. No complaints about a mismatch between sound and vibration were noted. The resulting overall quality was judged to be comparable to the low-pass conditions in Fig. 7.7.

Notably, even the 160 Hz vibration resulted in fair-quality ratings. However, compared with the 80 Hz condition, a trend toward worse judgments was observed (p ≈ 0.11). A much stronger effect was expected because this vibration frequency is relatively high, and tingling effects can occur. There was some disagreement between participants, which can be observed in the larger confidence intervals for this condition.

Even more interesting, the reproduction of WGN resulted in fair-quality ratings. However, this condition was still judged to be slightly worse than the 40 and 80 Hz vibrations (average difference = 11, p *<* 0.05). The effect was strongest for the BACH sequence, which resulted in poor-quality judgments (very significant interaction between sequence and treatment, p *<* 0.01). The BACH sequence contained long tones that lasted for several seconds, which did not fit with the 'rattling' vibrations excited by the noise. In contrast, in the BMG, DVORAK, and VERDI sequences, impulses and short tones resulted in brief vibration bursts of white noise, which felt less like 'rattling.' Nevertheless, the character of the bursts was different from sinusoidal excitation. Specifically, in the BMG sequence the amplitude of the transient vibrations generated by the bass drum varied depending on the random section of the noise. This finding is most likely one of the reasons why the quality judgment for BMG in the noise condition tended to be worse compared, e.g., with the approach using a 40 Hz vibration.

Given these observations, it appears that even simple vibration signals can result in good reproduction quality. For the tested sequences, amplitude-modulated sinusoids at 40 and 80 Hz worked well.

## *7.3.5 Compression of Dynamic Range*

In the previous experiments, the overall vibration intensity was adjusted individually by each test participant. However, the intensity differences between consecutive vibration components or between vibration components at different frequencies were kept constant. In the pilot experiments [25], it was reported that expected vibrations were sometimes missing. This might be because of the differing frequency-dependent thresholds and growth of sensations for the auditory and tactile modality [30]. Therefore, an attempt was undertaken to better adapt the signals to the different dynamic ranges.

To better match crossmodally the growth of auditory and tactile sensation with increasing sound and vibration intensity, the music signal is compressed in the vibration-generation process, as illustrated in Fig. 7.15. As one moves toward lower frequencies, the auditory dynamic range decreases gradually and the growth of sensation with increasing intensity rises more quickly [44]. In the tactile modality, the dynamic range is generally smaller than for audition; however, no strong dependence on frequency between 10 and 200 Hz was found [30]. Accordingly, there was not much variation between frequencies in the growth of sensation of seat vibrations with increasing intensity. Therefore, less compression seems necessary toward lower frequencies. However, a frequency-independent compression algorithm was implemented for simplicity.

**Fig. 7.15** Signal processing chain to generate vibration signals from the audio sum. The low-passfiltered signal was compressed using different compression factors

**Fig. 7.16** Mean overall quality evaluation for no-vibration and different dynamics compression vibration-generation approaches, plotted with 95% confidence intervals

The amount of compression needed for ideal intensity matching between both modalities was predicted using crossmodal matching data [26]. For moderate sinusoidal signals at 50, 100, and 200 Hz, a 12 dB increase in sound pressure level matched well with an approximately 6 dB increase in acceleration level, which corresponds to a compression ratio of two. Further, the curve of sensation growth versus sensation level flattens toward higher sensation levels in the auditory [16] and tactile domains [35]. This finding might be important because loud music typically excites weak vibrations. The effect can be accounted for by using higher compression ratios. Therefore, three compression ratios (two, four, and eight) were selected for testing. Attack and release periods of 5 ms were chosen to follow the source signals quickly.

Statistical analysis was applied as described above using a repeated-measures ANOVA and *post hoc* pairwise comparisons with Bonferroni correction. The quality scores for the concert experience using the three compression ratios are plotted in Fig. 7.16. Again, the no-vibration condition was used as a reference. Compressing the audio signal by a ratio of 2 resulted in significantly improved quality perception as compared to the no-vibration condition (average difference = 26, p *<* 0.01). Although the ratings were not statistically better than the 100 Hz low-pass condition in Sect. 7.3.1, some test participants reported that the initial-level adjustment was easier, particularly for the DVORAK sequence. This finding is plausible because the DVORAK sequence covers quite a large dynamic range at low frequencies, which might have resulted in missing vibration components if the average amplitude was adjusted too low or in mechanical stimulation that was too strong if the average amplitude was adjusted too high. Therefore, compressing the dynamic range could have made it easier to select an appropriate vibration level.

Increasing the compression ratio further to 4 or 8 reduced the averaged quality scores (average difference between 2 and 4 ratios = 11, p *<* 0.05; average difference between 2 and 8 ratios = 18, p *<* 0.01). The reason for this decrease in quality appeared to be the noise floor of the audio signal, which was also amplified by the compression algorithm. This vibration noise was primarily noticeable and disturbing during the passages of music with little or no low-frequency content. In particular, such passages are found in BACH and VERDI. This fact would explain the bad ratings for these sequences already with a compression ratio equal to 4. To check this hypothesis, the compression ratio was set to 8, this time using a threshold, and tested again. Loud sounds above the threshold were compressed, whereas quieter sounds remained unaffected. The threshold was adjusted for each sequence so that no vibrations were perceivable during passages with little frequency content below 100 Hz. The resulting perceptual scores are plotted on the right side in Fig. 7.16. The quality was judged to be significantly better compared with the no-vibration condition (average difference = 34, p *<* 0.01) and with compression ratios of 4 and 8 without a threshold (average difference = 18 and 26, respectively, p *<* 0.01). However, there was no significant difference compared with a compression ratio of 2. These findings indicate that even strong compression might be applied to music-induced vibrations without impairing the perceived quality of a concert experience. In contrast, compression appears to reduce the impression of missing vibrations, and thus makes it easier to adjust the vibration level. However, a suitable threshold must be selected for strong compression ratios. Setting such threshold appears possible if the source signal has a wide dynamic range, which is typically the case for classical recordings. In contrast, modern music or movie soundtracks are occasionally already highly compressed with unknown compression parameters, which could be problematic.

## *7.3.6 Summary*

Various audio-induced vibration-generation approaches have been developed based on fundamental knowledge about auditory and tactile perception. The perceived quality of concert reproduction using combined loudspeaker sound and seat vibrations was evaluated. It can be summarized that seat vibrations can have a considerably positive effect on the experience of music. Since the test participants evaluated all approaches in completely randomized order, the resulting mean overall quality values can be directly compared. The quality scores for concert experiences using some of the vibration-generation approaches are summarized in Fig. 7.17 (all judged very significantly better than without vibrations, p *<* 0.01).

**Fig. 7.17** Mean overall quality evaluation for music reproduction using selected vibrationgeneration approaches. For better illustration, individual data points have been connected with lines

The low-pass filter approach is most similar to vibrations potentially perceived in real concert halls and resulted in good-quality ratings. The approach is not computationally intensive and can be recommended for reproduction systems with limited processing power. Because the differences between a low-pass filter of 100 Hz and 200 Hz were small, the lower cutoff frequency is recommended to minimize sound generation from the vibration system. With additional processing, the unwanted sound can be further reduced while preserving good-quality scores. To this end, one successful approach involves compression in the frequency range, e.g., using octave shifting. Surprisingly, even strong frequency limitation to a simple amplitudemodulated sinusoidal signal seems to be applicable. This allows for much simpler and cheaper vibration reproduction systems, e.g., in home cinema scenarios. However, some signal processing power is necessary, e.g., to extract the envelope of the original signal. Furthermore, it seems useful to apply some dynamic compression, which makes it easier to adjust the vibration level. In this study, source signals with a high dynamic range have been used as a starting point. Further evaluation using audio data whose dynamics are already compressed with unknown parameters is necessary.

Participants usually chose higher acceleration levels in the laboratory compared to measurements in real concert situations. It can be hypothesized that the absolute acceleration level influences the perceived quality of a concert experience. This question should be examined in a further study.

In summary, test participants seemed to be relatively tolerant to a wide range of music and seat vibration combinations. Perhaps our real-life experience with the simultaneous perception of auditory and tactile events is varied and expectations are therefore not strictly determined. For example, the intensity of audio-related vibrations might vary heavily between different concert venues. Additionally, various aspects of tactile perception are less refined than for audition. In particular, frequency resolution and pitch perception are strongly restricted [42] for touch, which allows the modification of frequency content within a wide range.

The effect of additional vibration reproduction depended to some extent on the selected music sequence. For example, the BMG rock music sequence was judged significantly better in most of the cases including vibrations than the classical compositions (see Fig. 7.17). This seems plausible because we expect strong audio-induced vibrations at rock concerts. However, adding vibrations seems to clearly increase the perceived concert quality, even for classical pieces of music.

## **7.4 Conclusions**

It has been shown in this chapter that there is a general connection between vibrations and the perceived quality of music reproduction. However, in this study only seat vibrations have been addressed, and a 5.1 surround sound setup was used. Interestingly, none of the participants complained about an implausible concert experience. Still, one could question whether the 5.1 reproduction situation can be compared with a live situation in a concert hall or church. Because test participants preferred generally higher acceleration levels, it is hypothesized that real halls could benefit from amplifying the vibrations in the auditorium. This could be achieved passively, e.g., by manipulating floor construction, or actively using electrodynamic exciters as in the described experiments. Indeed, in future experiments it would be interesting to investigate the effect of additional vibration in a real concert situation. Also, the vibration system could be hidden from participants in order to avoid possible biasing effects.

During the experiments, the test participants sometimes indicated that the vibrations felt like tingling. This effect could be reduced by removing higher frequencies or shifting them down. However, this processing also weakened the perceived tactile intensity of broadband transients. The question arises, what relevance do transients have for the perceived quality of music compared with steady-state vibrations? One approach to reduce the tingling sensations for steady-state tones and simultaneously keep transients unaffected would be to fade continuous vibrations with a long attack and a short release using a compressor. This type of temporal processing appears to be promising based on an unpublished pilot study and should be further evaluated.

Another approach for conveying audio-related vibration would be to code auditory pitch information into a different tactile dimension. For example, it would be possible to transform the pitch of a melody into the location of vibration along the forearm, tongue, or back using multiple vibration actuators. This frequency-to-place transformation approach is usually applied in the context of tactile hearing aids, in which the tactile channel is used to replace the corrupt auditory perception [20, 40]. However, in such sensory substitution systems, the transformation code needs to be learned. It has been shown in this study that it might not be necessary to code all available auditory information into the tactile channel to improve the perceived quality of music. Still, there is creative potential using this approach, which was applied in several projects [10, 11, 15].

Another interesting effect is the influence of vibrations on loudness perception at low frequencies, the so-called auditory-tactile loudness illusion [33]. It was demonstrated that tones were perceived to be louder when vibrations were reproduced simultaneously via a seat. This illusion can be used to reduce the bass level in a discotheque or an automobile entertainment system [29] and might have an effect on the ideal low-frequency audio equalization in a music reproduction scenario.

## **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# Part II Haptic Musical Interfaces: Design and Applications

# **Chapter 8 The MSCI Platform: A Framework for the Design and Simulation of Multisensory Virtual Musical Instruments**

### **James Leonard, Nicolas Castagné, Claude Cadoz and Annie Luciani**

**Abstract** This chapter presents recent work concerning physically modelled virtual musical instruments and force feedback. Firstly, we discuss fundamental differences in the gesture–sound relationship between acoustic instruments and digital musical instruments, the former being linked by dynamic physical coupling, the latter by transmission and processing of information and control signals. We then present an approach that allows experiencing physical coupling with virtual instruments, using the CORDIS-ANIMA physical modelling formalism, synchronous computation and force-feedback devices. To this end, we introduce a framework for the creation and manipulation of multisensory virtual instruments, called the MSCI platform. In particular, we elaborate on the cohabitation, within a single physical model, of sections simulated at different rates. Finally, we discuss the relevance of creating virtual musical instruments in this manner, and we consider their use in live performance.

## **8.1 Introduction**

Computers have deeply changed our way of thinking, working, communicating and creating. The musical world is no exception to this transformation, whether in popular music—which now relies predominantly on electronic means—or in the processes of many modern composers who use software tools to address formal compositional

C. Cadoz · A. Luciani

J. Leonard (B) · N. Castagné

Laboratoire ICA—Ingénierie de la Création Artistique, Institut polytechnique de Grenoble, Université Grenoble Alpes, 46 Avenue Félix Viallet, 38000 Grenoble, France e-mail: james.al.leonard@gmail.com

ACROE—Association pour la Création et la Recherche sur les Outils d'Expression & Laboratoire ICA—Ingénierie de la Création Artistique, Institut polytechnique de Grenoble, Université Grenoble Alpes, 46 Avenue Félix Viallet, 38000 Grenoble, France

S. Papetti and C. Saitis (eds.), *Musical Haptics*, Springer Series on Touch and Haptic Systems, https://doi.org/10.1007/978-3-319-58316-7\_8

problems, and to capture, synthesise, process and manipulate sound. The rapid advances in computer technology now enable real-time computing and interactive control of complex digital sound synthesis and processing algorithms. When coupled with interfaces that capture musical gestures and map them to the algorithms' parameters, such systems are named digital musical instruments (DMIs). They are now widespread musical tools and allow for a true form of virtuosity.

However, a fundamental question arises as to the relationship between a musician and a DMI: is it of a similar nature to the relationship that is established with conventional instruments? This question is complex, especially given the available panoply of synthesis techniques and control paradigms. Moreover, digital synthesis brings forth an array of new possibilities for controlling musical timbres, as well as their arrangement at a macro-structural level. It is quite legitimate to ask oneself if these tools should be envisaged by analogy to acoustical instruments, e.g. if they should offer means of manipulation analogous to traditional instruments, or if they require entirely new control and interaction paradigms.

This issue finally questions the very definition of musical instrument: can (and should) a digital interface controlling a real-time sound synthesis process be called an instrument, in the sense that it enables an embodiment comparable to traditional instruments? Can DMIs and conventional instruments be grouped into the same category? Also, is controlling digital synthesis by imitating the way we interact with traditional instruments the most effective approach?

We discuss these issues by considering that the recreation of the physical *instrumental* relationship between musicians and DMIs is indeed relevant (see Chap. 2). When a digital sound synthesis process is physically based (i.e. relying on physical laws to create representations of sound-producing virtual objects), a bidirectional link between gesture and sound can be established that coherently transforms mechanical energy provided by the user into airborne vibrations of the virtual instrument. Such is the case in acoustical and electroacoustical instruments, referred to by Cadoz as the *ergotic* function of instrumental gestures [6], and has been proven a key factor in their expressiveness [24, 33].

The design of DMIs addressing these issues calls for:


Our answer to these requirements is the *Modeleur*-*Simulateur pour la Création Instrumentale* (MSCI) platform, a complete workstation for designing and crafting physics-based multisensory virtual musical instruments and for playing them with force feedback.

The following sections will present: (a) the specifics of multisensory virtual musical instruments, (b) hardware and software design for the MSCI platform, (c) considerations for modelling the mechanics of musical instruments and their decomposition into sections simulated at different rates and (d) use of the platform as a creative tool, including the first use of the MSCI platform by Claude Cadoz in a live performance.

## **8.2 A Physical Approach to Digital Musical Instruments**

The incorporation of haptic devices into musical applications has become a regular feature in the field of computer music, be it by using force-feedback systems or vibrotactile actuators—now present in widespread consumer electronics (common actuators technology is described in Sect. 13.2). Devices are becoming more affordable, and a wide number of studies point towards the benefits yielded by such systems in terms of control and manipulation for musical tasks [2, 3, 16, 24, 27–29] (see also Chap. 6).

Two main approaches for integrating haptics in digital instrumental performance can be distinguished: (i) augmenting DMIs with haptic feedback to enhance their control and convey information to the user, or (ii) making a virtual instrument *tangible* by enabling gestural interaction with a haptic representation of all or part of the instrument's mechanical features. Concerning the latter case, at least two subcategories can be described, namely: (ii-a) the distributed approach, in which the user interacts haptically with a model of the gestural interface of the instrument, which in turn controls the sound synthesis process through feed-forward mapping strategies (historically referred to as *multimodal* approach at ACROE-ICA), and (ii-b) the unitary approach, in which the entire instrument is represented by a single physical model that is used to render audio, haptic and possibly even visual feedback (we refer to this single-model scenario as *multisensory*).

## *8.2.1 Distributed Approach to Haptic Digital Musical Instruments*

The distributed (or multi-model) approach to haptic DMIs follows the classic decomposition into gestural controller and sound synthesis sections [33]. The haptic, aural and sometimes visual stimuli are physically decoupled from each other, due to the distributed architecture of the instrument (see Fig. 8.1). Haptic feedback incorporated into the gestural controller enables coupling with certain components of the DMI, for instance, by programming the mechanical behaviour of the gestural control

**Fig. 8.1** Distributed approach to haptic digital musical instruments

section using a local haptic model. Data extracted from the interaction between the user and this model can then be mapped to chosen sound synthesis parameters.

Some examples of this approach are the *Virtual Piano Action* by Gillespie [15], or the DIMPLE software [30] in which the user interacts with a rigid dynamics model, and information concerning this interaction (positions, collisions, etc.) is then mapped to an arbitrary sound synthesis process, possibly a physically based simulation.

Vibrotactile feedback inferred from the sound synthesis process itself can be provided to the user by integrating vibration actuators into the gestural controller. Such is the case of Nichols' vBow friction-driven haptic interface [26] or Marshall's vibrotactile feedback incorporated directly into DMI controllers [25].

Technical implementations of these systems generally rely on asynchronous computation loops for haptics and sound, employing low- to mid-priced haptic devices such as the Phantom Omni or the Novint Falcon. While these systems tend to bridge the gap between gestural control section and sound synthesis, the sound is still driven by mapping of sensor data, and the user physically interacts only with a local subsection of the instrument.

## *8.2.2 Unitary Approach to Virtual Musical Instruments*

An alternative approach to implementing haptic DMIs is to model the virtual instrument as a single multisensory physical object that jointly bears mechanical, acoustical and possibly visual properties, inherent to its physical nature. Physical modelling techniques are then the only viable approach. As a result, the gestural controller and sound synthesis sections are tightly interconnected: haptic interaction with one part of the instrument will affect it as a whole, and the player is haptically coupled with a complete single model (see Fig. 8.2 and Chap. 2).

Making use of this approach, one can distinguish:

**Fig. 8.2** Unitary approach to haptic digital musical instruments


MSCI fits into the latter category. The platform provides a musician-friendly physical modelling environment in which users can design virtual musical instruments, and allows unified multisensory interaction by simulating those instruments on a dedicated workstation that supplies coherent aural, visual and haptic feedback.

## **8.3 Hardware and Software Solutions for the MSCI Platform**

## *8.3.1 The TGR Haptic System*

The *transducteur gestuel rétroactif* (TGR) is a force-feedback device designed by the ACROE-ICA laboratory (Fig. 8.3). The first prototype was proposed by Florens in 1978 [12], conceived specifically for the requirements of artistic creation, in particular for instrumental arts such as music. The first goal of the TGR is to render the dynamic

**Fig. 8.3** TGR haptic device. Left: a bowing end-effector; right: a 12 key TGR with keyboard end-effectors

qualities of mechanical interactions with simulated objects with the highest possible fidelity: to this end, it offers both a high mechanical bandwidth (up to 15 kHz) and high peak force feedback (up to 200 N per degree of freedom).

Several *slice*s (1-DoF modular electromechanical systems comprised of a sensor and an actuator) can be combined allowing for any number of force-feedback-enabled degrees of freedom [14]. The device employed in the MSCI platform gathers 12 independent modules that can be combined with various mechanical end-effectors, forming 1D, 2D, 3D or even 6D morphological configurations, adapted to the diverse nature of instrumental gestures such as striking, bowing, plucking, grasping.

## *8.3.2 The CORDIS-ANIMA Formalism*

CORDIS-ANIMA [5] is a modular formalism for modelling and simulating massinteraction networks—that is physical models described by Newtonian point-based mechanics. It defines two main module types:


CORDIS-ANIMA incorporates the notion of physical coupling between networks of elementary modules through the interdependence of two dual variables: position, an *extensive variable* that gives <MAT> modules a position in space, and force,

an *intensive variable* that originates from interactions between <MAT> modules described by <LIA> modules. Computing the network requires a closed-loop calculation: first, of the new positions of <MAT> modules, and second, of all the forces produced by the <LIA> modules according to the new positions of the <MAT> modules that they are connected to (Fig. 8.4).

Several CORDIS-ANIMA implementations are declined for different geometrical spaces: 1D with scalar distances, or 1D, 2D and 3D with Euclidian distances. The 1D scalar distance version is generally used to simulate vibroacoustic deformations in which all <MAT> modules move along a single scalar axis. Models built in this way are topological networks that may represent a first-order approach to vibratory deformations as found in musical instruments—a simplification that works well in most cases.

For sound-producing physical models, networks must be simulated at audio-rate frequency (generally set at 44.1 kHz) in order to faithfully represent acoustical deformations that occur in the audible range (up to 20 kHz). Non-vibrating models, designed to, e.g. produce visual motion or mechanical systems, are often simulated in 1D, 2D or 3D geometrical spaces and at lower frequencies in the range 1–10 kHz, a bandwidth suited to instrumental performance.

The TGR haptic device is represented in CORDIS-ANIMA as a <MAT> module: this reports positions taken from its sensors and receives forces from the connected <LIA> modules which are then sent to the TGR's actuators.

## *8.3.3 The GENESIS Software Environment*

GENESIS [9] is ACROE-ICA's modelling and simulation software for musical creation. It allows to model vibrating objects—from elementary oscillators to complex musical scenes—and to simulate them off-line at 44.1 kHz. GENESIS implements

**Fig. 8.5** Representation of a physical model in the GENESIS environment

**Fig. 8.6** Simulation of a GENESIS model, showing displacement along the *x*-axis

the 1D version of CORDIS-ANIMA, meaning that all <MAT> physical modules move along a single scalar axis conventionally labelled *x*.

The modelling interface consists in a workbench representing the *y*-*z* plane, where <MAT> modules can be placed and interconnected through <LIA> modules to form topological networks (Figs. 8.5 and 8.6). Modules are given physical parameters that dictate their physical behaviour and initial conditions (initial position and speed of <MAT> modules).

## *8.3.4 Synchronous Real-Time Computing Architecture*

The vast majority of available haptic devices communicate asynchronously with physical simulations [11, 30]. Generally, the haptic loop runs locally at approximately 1 kHz, whereas other model components are computed with a lower rate and low demanding latency constraints, following a distributed approach. Current generalpurpose computer architectures are perfectly suited for these applications. However, when striving for energetically coherent instrumental interaction between the user and the simulated object, the communication between the haptic device and the simulation plays a key role.

As underlined in Sect. 8.3.2, the global physical entity composed of the forcefeedback device and virtual object can be defined as a physical, energy-conserving system only if the haptic position and force data streams integrate seamlessly into

**Fig. 8.7** Hardware and software architecture of the MSCI platform

the CORDIS-ANIMA closed-loop simulation. To this end, the haptic loop must run synchronously at the rate of the physical simulation, with single-sample latency between its position output and force input. For simulations running at several kHz, the time step (approximately 20–100 µs) within which AD/DA conversions, bidirectional communication with the haptic device and a single computation loop for the whole physical model must occur imposes a reactive computing architecture with guaranteed response time, which is not attainable by general-purpose machines [10].

Additionally, the simulation of physical models sufficiently complex for musical purposes is computationally demanding and therefore ill-fitted for calculation on most current embedded systems. A previous simulation architecture at ACROE-ICA [19] was based on the TORO board from Innovative Integration; while it allowed running the haptic loop synchronously at audio rate (44.1 kHz), the available processing power limited the system to small-scale physical models [20].

The hardware and software architecture of the MSCI platform (shown in Fig. 8.7) consequently addresses both the need for high computing power and reactive I/O. It is based on the RedHawk Linux real-time operating system (RTOS), where the physical simulation is computed in two sections: one running at audio rate (44.1 kHz) and the other running at control (gestural) rate (1–10 kHz). The TORO DSP board serves as a front-end for haptic I/O. Sound is handled by an external soundcard. These components are synchronised through a shared master clock (the soundcard's wordclock). Visualisation data, on the other hand, is processed asynchronously so as to display the physical model during simulation.

This platform can simulate virtual scenes with up to 7000 interacting audio-rate physical modules: an approximate performance gain by a factor of 50 compared to the previous embedded architecture.

**Fig. 8.8** Analysis of the musician/instrument ensemble as a dynamical system

## **8.4 Multi-rate Decomposition of the Instrumental Chain**

The MSCI architecture is based on the idea of decomposing a physical model into a section running at audio rate and another one running at a lower gestural rate. In what follows we discuss the motivations for this decomposition, and how it can be addressed in the CORDIS-ANIMA framework while retaining physical coupling between the two sections of the physical model.

## *8.4.1 Gesture–Sound Dynamics*

The mechanics of traditional instruments present a natural cohabitation of multiple dynamics. In particular, instruments can be generally separated into:


These two sections are coupled by means of nonlinear interactions (percussion, friction, plucking, etc.) that transform low-bandwidth gesture energy into highbandwidth energy of acoustical vibrations (Fig. 8.8).

Since these two sections of an instrument operate at different frequency rates, it comes naturally to simulate their discrete-time representations at different sampling rates. While this results in computational optimisation, a major issue arises: how to retain coherent physical coupling between the low-rate and high-rate sections of the instrument and at the same time meet the constraints of synchronous simulation?

## *8.4.2 Multi-rate CORDIS-ANIMA Simulations*

#### **8.4.2.1 Multi-rate Closed-Loop Dynamic Systems**

The physical coupling between two sections of a CORDIS-ANIMA model simulated at different rates brings forth two main questions: (i) how to ensure transparent communication of the position and force variables between the two discrete-time systems in order to represent the physical coupling between them, and (ii) how to limit the bandwidth of position and force signals when transiting from one simulation space to the other? For instance, if no band-limiting is applied to the higher rate signals before passing them to the low-rate section, aliasing is produced.

At first glance, the latter seems to be an elementary signal processing issue, solvable by using up- and down-sampling and low-pass digital filtering. However, the physical simulation imposes strict constraints on the operators that can be used: it is a closed-loop system in which force and position variables are coupled within a single simulation step. In other words, a maximum delay of one sample is allowed between all the inputs and outputs, while any additional delay alters the physical consistency of the system and considerably affects the numerical stability of the simulation [22]. This prevents using many standard signal processing tools for up- and down-sampling and digital filtering, as the vast majority of them introduce additional delays.

#### **8.4.2.2 Inter-Frequency Coupling Operators**

To address the above issue, up- and down-sampling of position and force variables travelling between the high- and low-rate sections must rely on delay-free *(zeroorder)* operators, even though they necessarily introduce a trade-off in terms of quality of the reconstructed signals. The operators were chosen in accordance with the nature of the variables and their integration into the CORDIS-ANIMA computational scheme, so as to preserve the integrity of the physical quantities circulating inside the multi-rate simulation.

The two types of connections allowed by these operators are given in Fig. 8.9, where *XLF* and *FLF* represent, respectively, the low-rate position and force signals, whereas *XHF* and *FHF* represent the high-rate signals. Since no delay is introduced, the closed-loop nature of the simulation is preserved.

Theory and experiments demonstrate that a multi-rate model implemented in this manner behaves identically to an equivalent low-rate model in terms of numerical stability, provided that the model operates only in the lower frequency range. However, an inevitable consequence of using these operators is that high-rate signals are distorted. If left untreated, these distortions make the system completely unusable. Consequently, a solution has to be found to filter out unwanted artefacts, while once again avoiding any delay in the position–force closed loop.

**Fig. 8.9** Two inter-frequency coupling schemes with delay-free up- and down-sampling operators (HF stands for high frequency, LF for low frequency)

#### **8.4.2.3 Low-Pass Filtering by Means of Physical Models**

Fortunately, CORDIS-ANIMA models can act as filtering structures [18]. As a basic example, a simple mass–spring oscillator excited by an input force signal can be regarded as a second-order low-pass filter whose transfer function can be expressed explicitly in terms of physical parameters [17]. This property has, for instance, been used to build small virtual physical systems that smooth noise in the position data provided by the TGR's sensors [19].

It is thus possible to design *physical* low-pass filters that are as transparent as possible within the low-rate bandwidth, and present a sharp cut-off before the low-rate Nyquist frequency. We have modelled such filters as propagation lines (mass–spring chaplets) with specific mass, stiffness and damping distribution. They are used to eliminate distortion generated by the up-sampling operators and serve as anti-aliasing filters for the circulation of high-rate signals towards low-rate sections, while preserving physical consistency. Careful tuning and scaling of the filtering structures ensure minimal impact on the mechanical properties of the simulated object (e.g., in terms of added stiffness, damping and inertia).

#### **8.4.2.4 Complete Multi-rate Haptic Simulation Chain**

Figure 8.10 presents the complete multi-rate haptic simulation chain as implemented in the MSCI platform. An instrument is decomposed into a lower bandwidth *gestural* section and higher bandwidth vibrating section, simulated synchronously at audio rate. The two sections are coupled through multi-rate operators, a filtering mechanism and a nonlinear interaction that transform gestural energy into vibroacoustic deformations. Physical energy is conserved throughout the system, ensuring computation stability.

**Fig. 8.10** Complete multi-rate haptic simulation chain of the MSCI Platform

These solutions combined allow establishing a true energetic bridge between the *real*-*world* user and the *simulated* instrument, supporting the *ergotic* function of musical gestures, as defined by Cadoz and Wanderley [6, 7] (see also Chap. 2).

## **8.5 Virtual Instruments Created with MSCI**

## *8.5.1 Workflow and Design Process*

Creating physical models in MSCI is similar to classic modelling with GENESIS, especially concerning the design of vibrating sections of the instrument. The haptic device is integrated directly into the CORDIS-ANIMA model as a series of TGR <MAT> modules, one for each allocated 1-DoF. However, designing haptic DMIs in this way presents a number of specific concerns:


Details concerning calibration and impedance matching are described in [20], and various instrument designs are discussed in [21].

## *8.5.2 Specificities of MSCI Haptic Virtual Instruments*

Since the first release of the MSCI platform in 2015, over 100 virtual instruments have been created by the authors, students and the general public. The computing power of modern systems has allowed for the first time to simulate and interact haptically with large-scale instruments composed of thousands of interacting modules. Figure 8.11 shows an example of such models. Especially for large structures with nonlinear acoustical behaviour—such as membranes or cymbals—exploration through realtime manipulation greatly facilitates the iterative design and fine-tuning process.

One notable feature of MSCI's models is their rich and complex response to different categories of musical gestures [6]. Indeed, as the entire instrument is modelled physically with CORDIS-ANIMA, the user has access to each single simulated point of physical matter. This is not possible in more encapsulated or global physical modelling techniques such as digital waveguides [32] or modal synthesis [1]. This allows for subtle and complex control of the virtual instrument using various haptic modules for different gestures. In the case of a simple string, the excitation gesture could be, e.g. plucking, striking or bowing, whereas modification gestures could be, e.g. pinning down the string onto the fretboard to change its length and pitch (as shown in Fig. 8.12), gently applying pressure onto specific points of the open string to obtain natural harmonics, applying pressure near the bridge to "palm mute" the string or even dynamically move the bridge or the tuning peg of the string to change its acoustical properties over time.

Demonstration sessions and feedback from users tend to strongly confirm the importance of tight physical interaction with the virtual instruments. Even the simplest models can yield a wide palette of sonic possibilities, often leading users to spend a fairly long time (up to 30 min) exploring the dynamics, playing modalities

**Fig. 8.12** Plucked string model. Above: during plucking interaction; middle: open vibration of the string; below: pinning the string onto a fretboard, shortening its vibrating length

and haptic response of a single instrument. This fine degree of control enables an *enactive* learning process of getting to know an object (a virtual instrument in this case) through physical manipulation.

## *8.5.3 Real-Time Performance in Hélios*

*Hélios* is an interactive musical and visual piece that was created for the AST 2015<sup>1</sup> festival. For the first time, an MSCI force-feedback station was used in a public live performance. The entire musical content and the visual scenes are created with GENESIS, associating a vast pre-calculated physical model with a real-time MSCI simulation. Video content is projected onto two screens: a large screen for the

<sup>1</sup>Art—Science—Technologie—November 14-21, 2015—Grenoble, France.

**Fig. 8.13** Complete physical model for Hélios (approximately 200000 modules)

calculated visual scenes and a screen for the real-time visuals associated with the MSCI simulation. The sound projection is handled with a sound dome of 24 speakers, placed in a semi-sphere above the audience.

The pre-calculated virtual instrumental scene in *Hélios* is composed of approximately 200000 GENESIS modules (Fig. 8.13). The off-line simulation of this vast instrumental scene allows:


The MSCI system incorporated into the installation uses a 12 DoF force-feedback device (Fig. 8.3). A model made of approximately 7000 physical modules is loaded onto the MSCI workstation. This model is a subgroup of the entire model, which guarantees coherency between the sound textures produced by the off-line simulation and those produced during the real-time interaction with the MSCI virtual instrument. This fusion blurs the boundaries between off-line and real-time sections and offers rich possibilities for the composition and musical structure in the temporal, spatial and structural dimensions of the piece.

The described configuration illustrates one of the many possible interaction scenarios between real and virtual players, real and virtual instruments, real-time and off-line ("supra-instrumental") instrumental situations, as previously described by Cadoz [8].

## **8.6 Conclusions**

We have presented and discussed recent solutions developed at ACROE-ICA for designing and implementing multisensory virtual musical instruments. These converged into the MSCI platform, the first modelling and simulation environment of its kind, enabling large-scale computation of physical models and synchronous highperformance haptic interaction that retains the *ergotic* qualities of musical gestures in a digital context.

Several scientific and technological questions have been addressed by this work, in particular concerning the formalisation and implementation of physical models containing sections running at different rates. The models created so far and feedback from users lead us to believe that MSCI offers high potential as a musical *metainstrument*, and that it is suitable for use in live performances, as demonstrated by Claude Cadoz in his two representations of *Hélios*.

Further developments will include incorporating mixed interaction between user manipulation and virtual agents inside the physical models. Most importantly, MSCI will be used in various creative contexts by musicians and composers and in pedagogical contexts to teach about physics, acoustics and haptics.

## **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Chapter 9 Force-Feedback Instruments for the Laptop Orchestra of Louisiana**

**Edgar Berdahl, Andrew Pfalz, Michael Blandino and Stephen David Beck**

**Abstract** Digital musical instruments yielding force feedback were designed and employed in a case study with the Laptop Orchestra of Louisiana. The advantages of force feedback are illuminated through the creation of a series of musical compositions. Based on these and a small number of other prior music compositions, the following compositional approaches are recommended: providing performers with precise, physically intuitive, and reconfigurable controls, using traditional controls alongside force-feedback controls as appropriate, and designing timbres that sound uncannily familiar but are nonetheless novel. Video-recorded performances illustrate these approaches, which are discussed by the composers.

## **9.1 Introduction**

Applications of force feedback for designing musical instruments have been studied since as early as 1978 at ACROE [14, 17, 21, 36] (Chap. 8 reports on recent advancements). Such works provide a crucial reference for understanding the role that haptic technology can play in music, and these are described in detail in a preceding chapter. The wider computer music community has demonstrated a sustained interest in incorporating force-feedback technology into musical works and projects. This has been evidenced by a series of projects during recent decades.

E. Berdahl (B) · A. Pfalz · M. Blandino · S. D. Beck

School of Music & CCT—Center for Computation and Technology, Louisiana State University, Baton Rouge, LA 70803, USA e-mail: edgarberdahl@lsu.edu

A. Pfalz e-mail: apfalz1@lsu.edu

M. Blandino e-mail: mblandi@lsu.edu

S. D. Beck e-mail: sdbeck@lsu.edu

Gillespie et al. have created some high-quality custom force-feedback devices and used them for simulating the action of a piano key [24, 26]. Verplank and colleagues, and Oboe et al. have initiated separate efforts in repurposing old hard drives into force-feedback devices for music [43, 55]. More recently, the work by Verplank and colleagues has been extended via a collaboration with Bak and Gauthier [2]. Several human–computer interface researchers have experimented with using motorized faders for rendering force feedback [48], even for audio applications [1, 23, 54]. The implementation of a force-feedback bowed string has also been studied in detail using various force-feedback devices [21, 37, 42, 49].

More recently, Kontogeorgakopoulos et al. have studied how to realize digital audio effects with physics-based models, for the purpose of creating force-feedback musical instruments [32, 33]. Also, Hayes has endowed digital musical instruments (DMIs) with force feedback using the NovInt Falcon [28]. Most recently, Battey et al. have studied how to realize generative music systems using force-feedback controllers [3].

## *9.1.1 Multisensory Feedback for Musical Instruments*

As described in Chap. 2, when a performer plays a traditional musical instrument, he or she typically receives auditory, visual, *and* haptic feedback from the instrument. By integrating information from these feedback modalities together [15, 39], the performer can more precisely control the effect of the mechanical excitation that he or she provides to the instrument (see Fig. 9.1).

Most digital musical instruments have primarily aimed at providing auditory and visual feedback [40]. However, haptic force feedback is an intriguing additional modality that can provide performers with enhanced feedback from a DMI. It has advantages such as the following:



## *9.1.2 Additional Force-Feedback Device Designs from the Haptics Community*

Outside the realm of computer music, a wide variety of (historically typically very expensive) haptic devices have been created and researched. Many of these have been used for scientific visualization and/or applications in telerobotic surgery or surgical training [12, 16, 29, 35, 38]. The expense of these devices will prevent their use from ever trickling down to large numbers of practicing musicians, but they are useful for research in haptics.

For instructional purposes, several universities have made simple haptic forcefeedback devices that are less expensive. For example, the series of "Haptic Paddles" are single degree-of-freedom devices based upon a cable connection to an off-theshelf DC motor [44]. However, such designs tend to be problematic because of the unreliable supply of surplus high-performance DC motors [25]. In contrast, the iTouch device at the University of Michigan instead contains a voice coil motor, which is hand wound by students [25]. However, making a large number of devices is time intensive, and the part specifications are not currently available in an opensource hardware format.

## *9.1.3 Open-Source Technology for the Design of Haptic Musical Instruments*

Force-feedback technologies tend to be rather complex. Consequently, small-scale projects have been hampered as the technological necessities have required so much attention that little time remained for aesthetic concerns. Furthermore, practical knowledge needed for prototyping haptic musical instruments has not been widely available, which has made it even more challenging for composers to access the technology.

In response, Berdahl et al. have created an open-source repository,1 which contains simple examples that provide insight into the design of haptic musical instruments. These examples are built upon a series of open-source tools that can be used to rapidly prototype new haptic musical instruments. The main projects within the repository are the following:


Workshops have been taught at a series of international conferences using the repository.

<sup>1</sup>https://github.com/eberdahl/Open-Source-Haptics-For-Artists (last accessed on August 16, 2017).

<sup>2</sup>The functionality of Max is extended by *abstractions*, which are custom-defined objects that encapsulate program code.

**Fig. 9.2** FireFader is a force-feedback device with two motorized faders. It uses open-source hardware and is based on the Arduino platform, so it can easily be reconfigured for a wide variety of applications

## *9.1.4 Laptop Orchestra of Louisiana*

Since its inception, the so-called *laptop orchestra* has become known as an ensemble of musicians performing using laptops. Precisely what qualifies as a laptop orchestra is perhaps a matter of debate, but historically they seem to be configured similarly to the original Princeton Laptop Orchestra (PLOrk). As described by Dan Trueman in 2007, PLOrk was then comprised of fifteen *performance stations* consisting of a laptop, a six-channel hemispherical loudspeaker, a multichannel sound interface, a multichannel audio power amplifier, and various additional commercial music controllers and custom-made music controllers [51, 52].

The Laptop Orchestra of Louisiana (shown in Fig. 9.3) was created in 2011 and originally consisted of five performance stations. Since then, it has been expanded to include ten performance stations and a server. Organizationally, the ensemble aims to follow in the footsteps of PLOrk and the Stanford Laptop Orchestra (SLOrk) by leveraging the integrated classroom concept, which encourages students to naturally and concurrently learn about music performance, music composition, programming, and design [56]. The Laptop Orchestra of Louisiana further serves the local community by performing repertoire written by both local students and faculty [50].

As opposed to composing for traditional ensembles, whose formation is usually clearly defined, composing for laptop orchestra is generally a very open-ended activity. Some authors even consider composing for laptop orchestra to be an ill-defined


**Table 9.1** Some of the virtual objects implemented by Synth-A-Modeler

problem [19]. An informative swath of repertoire now exists for laptop orchestras, and other ideas may be drawn from the history of experimental music. Due to its open-ended nature, treating the process of composing for laptop orchestra as a design activity can be fruitful. Specifically, early prototyping and iteration activities can be helpful in providing insight [19]. This kind of thinking is also helpful when designing virtual instruments for haptic interaction. The authors are working on this endeavor not only by prototyping, iterating, and refining interaction designs into music compositions, but also by expanding and honing the content available in the Open-Source Haptics for Artists repository [6, 7, 9, 11].

In 2013, students at Louisiana State University built a FireFader for each performance station. A laser-cut enclosure design was also created (see Fig. 9.2) to provide performers with a place to rest their hands. Then students and faculty started composing music for the Laptop Orchestra of Louisiana with FireFaders. This chapter reports on some ideas for composing this kind of music, as informed by the outcomes of these works. The following specific approaches are suggested: providing performers with precise, physically intuitive, and reconfigurable controls, using traditional controls alongside force-feedback controls as appropriate, and designing timbres that sound uncannily familiar but are nonetheless novel.

**Fig. 9.3** Laptop Orchestra of Louisiana performing in the DigitalMedia Center Theater at Louisiana State University

## **9.2 Enabling Precise and Physically Intuitive Control of Sound ("Quartet for Strings")**

Compared with other electronic controls for musical instruments, such as buttons, knobs, sliders, switches, touchscreens, force-feedback devices have the ability to *provide performers with precise, physically intuitive, and programmable control*. To achieve this, instruments need to be carefully designed so that they both feel good and sound good. It is helpful to carefully match the mechanical impedance of the instruments to the device and performers, and it is recommended to apply the principle of *acoustic viability*.

Demonstrating these characteristics, *Quartet for Strings* by Stephen David Beck is a quartet written for four virtual vibrating strings. Each of these strings is played by a single performer using a FireFader as depicted in Fig. 9.4. To match the structure of a traditional string quartet, the instruments are similarly scaled to allow different performers to play different pitch ranges. This results in four different virtual instrument scales: first violin, second violin, viola, and cello.

## *9.2.1 Instrument Design*

#### **9.2.1.1 Acoustic Viability**

*Acoustic viability* is a digital design principle that recognizes the importance of integrating nuance and expressive control into digital instruments, using traditional acoustic instruments as inspiration [4, 5]. Traditional acoustic musical instruments have been refined over long periods, often spanning performers' lifetimes, whole

**Fig. 9.4** *Quartet for Strings* is for a quartet of FireFaders and laptops, each of which enables a performer to play a virtual vibrating string

centuries, or even longer. Consequently, traditional instruments tend to exhibit complex mechanics for providing performers with nuanced, precise, expressive, and perhaps even intimate control of sound [4].

However, these nuanced relationships tend to sometimes be lacking in simple signal processing-based or even physics-based synthesizer designs. The reason for this is that significant effort is required during synthesizer design in order to afford nuance and expressive control. Therefore, for a digital instrument to be *acoustically viable*, it has been suggested that the synthesizer designer should implement cross-relationships between parameters such as amplitude, pitch, and spectral content [4, 5]. For example, designers can consider how changes in amplitude could affect the spectral centroid and vice versa [4].

With physics-based modeling, such cross-relationships will tend to be clearly evident if strong nonlinearities are present in a model. For example, if a lightly damped material exhibits a stiffening spring characteristic, then the *pitch modulation* effect will tend to result in these kinds of cross-relationships. This kind of effect can be observed in many real chordophones, membranophones, and idiophones [20].

Accordingly for *Quartet for Strings*, it was decided to create a plucked string instrument that exhibited tension modulation by interspersing masses ( ) with stiffeninglink objects ( ) as shown in Fig. 9.5 [8, 20]. As with related forcefeedback instruments, the right-hand side FireFader knob ( ) can be used to pluck ( ) the string (see Fig. 9.5, right). However, it was desired to also control the pitch of the string using the FireFader. This was achieved by making the string very loose or "slack" and then using the left-hand side FireFader knob to simultaneously touch ( ) all of the string masses. For more information on how

**Fig. 9.5** String model GooeyStringPitchModBass in Synth-A-Modeler consists of forty masses, interconnected by stiffeninglink objects and terminated by ground objects (see Table 9.1). The fader knob on the right-hand side is used to pluck one of the masses. The fader knob on the left-hand side is used to depress all of the masses simultaneously, which gradually increases the pitch

the stiffeninglink objects are parameterized, the reader is referred to a prior publication [8]. A demonstration video helps to illustrate how this instrument leverages the principle of *acoustic viability* to realize physically intuitive and expressive control.<sup>3</sup>

#### **9.2.1.2 Impedance Matching**

Impedance matching is a technique in which the impedances of two interacting objects are arranged to be similar to each other. This allows optimal energy exchange between them. As explained in Sect. 2.2, in the musician–instrument interaction, impedance matching ensures effective playability and tight coupling.

In the model GooeyStringPitchModBass, the weight of the virtual model (e.g., the string) needs to be approximately matched to the combined weight of a hand holding a fader knob. This is achieved by setting the weight of each virtual mass to be 1 g. Since the string is comprised of 40 masses, its total weight is 40 g, which is comparable to the combined weight of a hand holding a fader knob.

## *9.2.2 Performance Techniques*

Two special performance techniques further exploit the precise and physically intuitive control afforded by the designed instruments.

#### **9.2.2.1 Pizzicato with Exaggerated Pitch Modulation**

First, a performer can fully depress the string and then quickly release it. Then the force feedback rapidly moves the left-hand side fader knob back to a resting position. The sound of this technique is reminiscent of a Bartók pizzicato, except that the pitch

<sup>3</sup>https://cct.lsu.edu/eberdahl/V/DemoOfASlackString.mov (last accessed on August 16, 2017).

descends considerably and rapidly during the attack. In *Quartet for Strings*, this can be heard after the first introduction of the cello instrument.

It should be noted that this technique can only be used expressively due to the virtual nature of the string's implementation. The authors are not aware of any real strings that demonstrate such strong stiffening characteristic, do not break easily, and which could be reliably performed without gradual detuning of the pitch that the string tends toward upon release.

#### **9.2.2.2 Force-Feedback Jeté**

A second special technique emerges when a performer lightly depresses the lefthand side knob to lightly make contact with the virtual string. The model responds accordingly with force feedback to push the knob in the opposite direction (against the performer's finger). When the pressure the performer exerts and the response the model synthesizes are balanced in a particular proportion, the fader and instrument become locked together in a controlled oscillation. This oscillation can be precisely controlled through the physically intuitive connection with the performer. This technique is used extensively near the end of the piece. On the score, this technique is indicated using the marking *jeté*, giving a nod to the violin technique with the same name.

## *9.2.3 Compositional Structure*

*Quartet for Strings*is composed as a modular piece with three-line staves representing relative pitch elements (see Fig. 9.6). While precision of time and pitch is not critical to its performance, the piece was conceived as a composed, and not as an improvised work. It balances control over gesture and density with aleatoric arrangements of the parts.

In the sense that the score invites performers with less extensive performance experience to try to perform as expressively as possible, the authors believe that the score is highly effective in the context of a laptop orchestra. The score provides expressive markings to encourage the performers to try to fully leverage the acoustically viable quality of the instruments. At the same time, the score allows for some imprecision of the timing and pitches, freeing the performers from limiting their performance through precisely attending to strict performance requirements.

A studio video recording of *Quartet for Strings* is available for viewing at the project Web site, which demonstrates how the force feedback facilitates precise and physically intuitive control.<sup>4</sup>

<sup>4</sup>https://www.youtube.com/watch?v=l-29Xete1KM (last accessed on August 16, 2017).

**Fig. 9.6** Excerpt from *Quartet for Strings*

## **9.3 Traditional Controls Can Be Used Alongside Force-Feedback Controls ("Of Grating Impermanence")**

Different kinds of controls provide different affordances. In the context of laptop orchestra, where a variety of controls are available (such as trackpads, computer keyboards, MIDI keyboards, or even drum pads, tablets [51]), *traditional controls can be used appropriately alongside force-feedback controls*. For example, to help manage mental workload [41], buttons or keys can be used to change modes while force-feedback controls enable continuous manipulation of sound.

This approach is applied in *Of grating impermanence* by Andrew Pfalz. For this composition, each of the four performers plays a virtual harp with twenty strings (see Fig. 9.7), which can be strummed using a FireFader knob. As with *Quartet for Strings*, the performance of subtle gestures is facilitated by the force feedback coming from the device. The musical gestures are intuitive, comfortable, and feel natural to execute on the instruments.

## *9.3.1 Instrument Design*

The harp model incorporates both continuous control (via the faders) and discrete control (via the laptop keyboard). Due to this combination, performers can focus on dexterously making continuous musical gestures with the FireFader, while easily stepping through harp tunings using simple button presses. Specifically, the model shown in Fig. 9.7 is controlled as follows:


**Fig. 9.7** For *Of grating impermanence*, the harp model PluckHarp20 includes twenty strings that can be plucked using a single FireFader knob. Each of these strings is created by connecting a termination to a waveguide to a junction to a touch link to a second waveguide to a second termination (for more details, see Table 9.1)

dark and short, like a palm-muted guitar, to bright and resonant, like guitar strings plucked near their terminations.

• The right and left arrow keys of the laptop keyboards enable the performer to step forward or backward, respectively, through preprogrammed tunings for each of their twenty virtual strings. Consequently, the performers do not need to be continuously considering the precise tuning of the strings.

## *9.3.2 Performance Techniques*

#### **9.3.2.1 Simultaneously Changing the Chord and Strumming**

With training, the performers gravitate toward a particular performance technique, especially in sections of the composition with numerous chord changes. In these sections, the performers learn to use the following procedure: (1) wait for notes to decay, (2) use the arrow key to advance the harp's tuning to the next chord, (3) immediately strum the virtual strings using the FireFader, and (4) repeat. The ergonomics of this performance technique are illustrated in Fig. 9.8, which shows how each performer's right hand is operating a FireFader, while the left hand is operating the arrow keys (shown boxed in yellow in Fig. 9.8).

Visual feedback is further employed to help the performers stay on track. The index of each chord is displayed on the laptop screen in a large font, so that performers can error check their progress in advancing through the score.

#### **9.3.2.2 Accelerating Strums**

Preprogramming the note changes for banks of twenty plucked strings also enables a specialized strumming technique. Since each performer is passing the fader knob over

**Fig. 9.8** For *Of grating impermanence*, the performers use their right hands to pluck a harp of virtual strings and their left hands to press the arrow keys on the laptop keyboard (see the yellow rectangles above). The right arrow advances to the next chord for the harp, and the left arrow goes back to the previous chord

so many strings, it is possible for the performer to noticeably accelerate or decelerate during a single strumming gesture. This technique aids in building tension during the first section of the composition. The authors would like to note that, although no formal tests have been conducted, they have the impression that the force feedback is crucial for this performance technique, as it makes it possible to not only hear but also feel each of the individual strings.

#### **9.3.2.3 Continuous Control of Timbre for Strumming**

The second knob on each FireFader enables the performers to occasionally but immediately alter the timbre of the strings as indicated in the score. Since this technique is used sparingly, it has a stark influence upon the overall sound, but it is a powerful control that makes the instrument almost seem more lifelike. An additional distortion effect further influences the timbre of the strings, and this distortion is enabled and disabled by the arrow keys so as to match the printed score.

## *9.3.3 Compositional Structure*

*Of grating impermanence* is performed from a fixed score. The composition comprises several sections that demonstrate various performance techniques of the instrument. The score shows the notes that are heard, but each performer needs only choose where he or she is in the score, not to actually select notes as they would on a traditional instrument. In this way, the job of the performer is similar to that of a member of a bell choir: following along in the score and playing notes at the appropriate times.

The beginning and ending sections of the composition are texturally dense and somewhat freer. The gestures and timings are indicated, but the precise rhythms are not notated. The interior sections are metered and fully notated. Stylistically, these sections range from monophony to interlocking textures to fast unison passages.

A studio video recording is available for viewing at the project Web site, which illustrates how these performance techniques are enabled by combining traditional controls and force-feedback controls.<sup>5</sup>

## **9.4 Finding Timbres that Sound Uncannily Familiar but Are Nonetheless Novel ("Guest Dimensions")**

When composing electroacoustic music, it can generally be useful to compose new timbres, which can help give listeners new listening experiences. In contrast, if

<sup>5</sup>https://www.youtube.com/watch?v=NcxO1ChLcr0 (last accessed on August 16, 2017).

timbres sound familiar to a listener, they can beneficially provide "something to hold on to" for less experienced listeners [34], particularly when pitch and rhythm are not employed traditionally. In the present chapter, it is therefore suggested that *finding timbres that sound uncannily familiar but are nonetheless novel* can help bridge these two extremes [13, 18].

*Guest Dimensions* by Michael Blandino is a quartet that explores this concept, extending it by making analyzed timbres tangible using haptic technology. For example, each of the four performers uses a FireFader to pluck one of two virtual resonator models (see Fig. 9.9), whose original parameters are determined to match the timbre of prerecorded percussion sound samples.

## *9.4.1 Instrument Design*

#### **9.4.1.1 Calibrating the Timbre of Virtual Models to Sound Samples**

Two virtual resonator physical models were calibrated through modal decomposition of sound files of a struck granite block and of a gayageum, which is a Korean plucked string instrument [27, 30, 53]. This provided a large parameter set to use for starting the instrument design process.

#### **9.4.1.2 Scaling Model Parameters to Discover Novel Timbres**

Then, for each part and section of the composition, multiple model parameters were scaled with respect to the original estimated fundamental frequency, the original estimated decay times, reference mass values, pluck interaction stiffness, pluck interaction damping parameter, and virtual excitation location. It was discovered that even with the granite block, which did not have a harmonic tone, melodies could nonetheless be realized by scaling the modal frequencies over the range of a few octaves. This same approach was used to enable melodies to be played with the gayageum model.

Although performance techniques affected the timbre, the timbre could be more strongly adjusted via the model parameters. For example, to increase overall timbral interest and to increase sustain of the resonances, the decay times for the struck granite block sound were lengthened significantly, enhancing the resonance of the model. Further adjustment of the virtual excitation location and scaling of the virtual dimensions allowed for additional accentuation of shimmering and certain initial transient qualities. Similarly, the gayageum model's decay time was slightly extended, and its virtual excitation position was tuned for desired effects.

This exploration of uncannily familiar yet novel timbres is evident when listening to the video recording of *Guest Dimensions* on the project Web site.<sup>6</sup> The reader

<sup>6</sup>https://www.youtube.com/watch?v=SrlZ\_RUXybc (last accessed on August 16, 2017).

**Fig. 9.9** For *Guest Dimensions*, the general modal synthesis model incorporates a resonators object that is plucked using a single FireFader knob (see Table 9.1)

should keep in mind that the range of somehow familiar timbres realized during the performance stems from the two originally calibrated models of a struck granite block and a plucked gayageum.

## **9.4.1.3 Visual Display of the Force-Feedback Interaction**

The FireFaders are not marked to indicate where the center points of the sliders are, which corresponds to where the resonators were located in virtual space. Since *Guest Dimensions* calls for specific rhythms to be played, it was necessary to create a very simple visual display enabling the performers to see what they were doing. The display showed the position of the fader knob and the position of the virtual resonator that the fader knob was plucking. The authors have the impression that this display may have made it easier for the performers to play more precisely in time. Overall, the need for implementing visual displays for some music compositions is emphasized by the discussion in Sect. 9.1.1—generally speaking, the implementation of additional feedback modalities has the potential to enable more precise control.

## *9.4.2 Performance Techniques*

Two plucking performance techniques in *Guest Dimensions* are particularly notable. Of particular note is that these performance techniques are facilitated by the programmable nature of the force feedback. This enables the virtual model to be differently impedance matched when different performance techniques are being employed. For example, the *tremolo* performance technique is enhanced through a decreased virtual plectrum stiffness, while the *legato* performance technique is enhanced through a moderately increased virtual plectrum stiffness.

#### **9.4.2.1 Tremolo**

In the first section of the composition, the stiffness of the pluck link (see Fig. 9.9 and Table 9.1) in the model is set to be relatively low. This haptic quality enables the performers to particularly rapidly pluck back and forth across the virtual resonators object, obtaining a tremolo effect. Especially rapid plucking results in a louder sound, while slower plucking results in a quieter sound. According to the indications in the score of *Guest Dimensions*, the performers use the tremolo technique to create a range of dynamics.

#### **9.4.2.2 Legato**

In the sections not involving tremolo, the performers are mostly plucking more vigorously in a style that could be called *legato*. In those sections, the performers are playing various, interrelated note sequences. Instead of providing the performers with manual control over changing the notes (as with *Of grating impermanence*), it was decided that it would be more practical to automate the selection of all of the notes. Accordingly, the following approach was used to trigger note updates: right before one of the models is plucked, in other words right as the fader knob is approaching the center point for the plectrum, the next corresponding fundamental frequency is read out of a table and used to rapidly scale the fundamental frequency of the model. Careful adjustment of the threshold point is needed to avoid pitch changes during the resonance of prior attacks or changes after new attacks. Performers develop an intuition for avoiding false threshold detection through confident plucking. An advantage of this approach is that performers do not need to manually advance the notes; however, a performer without adequate practice may occasionally advance one note too many, and in this case, the performer will require a moment of tacit to recover.

## *9.4.3 Compositional Structure*

As with *Of grating impermanence*, *Guest Dimensions* is performed from a fixed score. Performers play in precise time according to a pre-written score, sometimes in homorhythm. Each part for each section utilizes one of the two models, but adjustments of the models are unique to the sections of each part. Melodic themes in counterpoint are performed with the gayageum, which are accompanied by the decorative chimes of the granite block model. Extended percussive sections feature the granite block model in strict meter, save for a brief passage in which the performers are free to separately overlap in interpretive gestures.

## **9.5 Conclusions**

A case study was presented demonstrating some ways that force-feedback DMIs could be integrated into laptop orchestra practice. The contributing composers realized a variety of compositional structures, but more commonalities were found in the successful instrument design approaches that were applied. Accordingly, the authors suggest that composers working in this field should consider the following: (1) providing performers with precise, physically intuitive, and reconfigurable controls, (2) using traditional controls alongside force-feedback controls as appropriate, and (3) designing timbres that sound uncannily familiar but are nonetheless novel. Music performance techniques were enabled that more closely resembled some traditional music performance techniques, which are less commonly observed in laptop orchestra practice.

## **References**


Haptic and Audio Interaction Design (HAID). Lecture Notes in Computer Science, **6306**, 61–62. Springer, Berlin, Heidelberg (2010)


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Chapter 10 Design of Vibrotactile Feedback and Stimulation for Music Performance**

**Marcello Giordano, John Sullivan and Marcelo M. Wanderley**

**Abstract** Haptics, and specifically vibrotactile-augmented interfaces, have been the object of much research in the music technology domain: In the last few decades, many musical haptic interfaces have been designed and used to teach, perform, and compose music. The investigation of the design of meaningful ways to convey musical information via the sense of touch is a paramount step toward achieving truly transparent haptic-augmented interfaces for music performance and practice, and in this chapter we present our recent work in this context. We start by defining a model for haptic-augmented interfaces for music, and a taxonomy of vibrotactile feedback and stimulation, which we use to categorize a brief literature review on the topic. We then present the design and evaluation of a haptic language of cues in the form of tactile icons delivered via vibrotactile-equipped wearable garments. This language constitutes the base of a "wearable score" used in music performance and practice. We provide design guidelines for our tactile icons and user-based evaluations to assess their effectiveness in delivering musical information and report on the system's implementation in a live musical performance.

## **10.1 Introduction**

In recent years, the widespread availability of smartphones and tablet computers made vibrotactile technology—in the form of actuators specifically designed to stimulate a user's sense of touch via vibration—inexpensive and readily available. Haptic researchers, both in academic and industrial contexts, have been designing ways of

IDMIL—Input Devices and Music Interaction Laboratory, CIRMMT—Centre for Interdisciplinary Research in Music Media and Technology, McGill University, 527 Rue Sherbrooke Ouest, Montréal, QC H3A 1E3, Canada e-mail: marcello.giordano@mail.mcgill.ca

J. Sullivan e-mail: john.sullivan2@mail.mcgill.ca

M. M. Wanderley e-mail: marcelo.wanderley@mcgill.ca

M. Giordano (B) · J. Sullivan · M. M. Wanderley

communicating via the sense of touch by means of tactile effects used to provide information such as: navigational cues [50], textures [30], or notifications [44]. Systematic studies have been conducted to assess the efficiency of these effects in well-defined contexts, and new prototypes and applications are constantly being investigated.

In the music domain, the sense of touch can be used to convey relevant musical information, such as articulation [43] and timing [51], especially in professional performances [29]. Several haptic interfaces for music performance and practice have been created in the last two decades, but for very few of these a thorough evaluation of their effectiveness has been conducted.

In this chapter, we present our work in the development and preliminary evaluation of meaningful ways to provide information to performers via the sense of touch for music performance and practice. Our research, conducted in the context of a multidisciplinary project involving haptic researchers, composers, and wearable designers, is aimed at the development of a language of tactile icons specifically designed to convey musical information to professional musicians. These icons, delivered via specialized garments equipped with arrays of vibrotactile actuators, have been evaluated to determine their effectiveness and reliability. They will be used as the building blocks of a *wearable score* language, which composers will use to create new pieces and art installations.

To provide a theoretical framework for this research, we present a brief overview of the current state of haptic feedback and stimulation in music technology. We expand the classical models of digital musical instruments (DMIs) [39] to include *general-purpose* tactile interfaces, i.e., devices where other sensory feedback may not be present and tactile feedback can be arbitrary mapped to external sources of information. We then present a literature review together with a taxonomy of tactile feedback and stimulation. This categorization is aimed at emphasizing the different functional roles that haptic technology can achieve in conveying musically relevant information.

## **10.2 Haptic Feedback in Music Technology**

Haptic technology has been widely used in the development of interfaces for musical expression and musical interaction, and two main classes of devices can be identified in this context: DMIs and general-purpose haptic interfaces.

In traditional musical instruments, the tactile and kinaesthetic feedback coming from the resonating parts of the instrument give the performer important information about their interaction [1, 20, 28, 43] (see Chap. 2). In DMIs, the decoupling of gesture acquisition from sound synthesis has the important effect of breaking the mechanical feedback loop between performer and sound-producing structures. Haptic feedback becomes then an arbitrary design factor [31], and the choice of actuators and signals used to drive them (see Sect. 13.2) defines the instrument's architecture.

Haptic devices can provide tactile cues during performance with DMIs, not only if embedded into the instruments themselves, but also when deployed separately by means of tactile displays and wearable devices that can be used to go beyond the direct performer–instrument interaction. In the context of music performance, these devices, which we refer to as *general-purpose haptic interfaces*, can convey information about performers' interactions with a live-electronics system [37] or as learning tools to direct and guide users' gestures via vibrotactile feedback [49] (see also Chap. 11). They can also be used to convey score cues to a performer on stage [45] by means of abstract languages of tactile icons [33]. In this context, the distinction between feedback and stimulation becomes clear: The former is a direct response of the instrument or the general-purpose interface to a user's action; the latter is not issued from a player–device interaction, but it is a means of communication with the user, mediated by the tactile actuators in the interface, which can be used to convey any sort of information.

These displays usually provide either localized (i.e., single body site) or distributed vibrations (via actuators placed on multiple body sites), requiring the design of tactile effects more centered on temporal or spatial properties, respectively, or a combination of both.

## *10.2.1 Models of Haptic-Enabled Interfaces*

The relationship between performer, haptic-enabled musical interface (either generalpurpose device or DMI), and audience can be complex, and a number of abstract models of the interaction between these components can be found in the literature. In the case of DMIs several models exist, each of which emphasizes different aspects of the instrument's design. Marshall [34] reviews four of these models [4, 5, 9, 54] and proposes a hybrid model merging characteristics across them.

In Fig. 10.1, we present an extension of this model, which is a representation of the interaction with either haptic-enabled DMIs or general-purpose devices. While the former can provide the performer with both kinaesthetic or tactile feedback, the latter are usually implemented as vibrotactile displays, for reasons that are mainly to be found in current technology limitations.<sup>1</sup> As mentioned above, the haptic channel does not need to be limited to the display of feedback issued as a direct response to performers' actions, but can be mapped arbitrarily to convey information from external sources such as environmental variables or score parameters. This is represented by the *external information* source in our model.

<sup>1</sup>We refer here to the case of general-purpose interfaces developed for musical applications. These displays are generally conceived as portable/wearable devices to be used by musicians either practicing or performing on stage. Kinaesthetic devices, on the other hand, are generally much larger in scale and are hence difficult to integrate into the design of a portable, general-purpose musical interface.

**Fig. 10.1** Model of a haptic DMI and general-purpose haptic device. In both devices, a haptic generator is used to produce haptic feedback and stimulation, which is issued from mapping of sensor data or external information. The simultaneous use of both types of devices is also possible, and sensor data from either device could be mapped to the haptic generator of the other

## *10.2.2 Haptic-Enabled Interfaces*

Haptic-enabled interfaces for music performance can be categorized according to the way they deliver haptic feedback and stimulation to the final users. Both DMIs and general-purpose devices can address either the kinaesthetic or the tactile modality, and this can be done in an *active* or a *passive* way [5]: Passive feedback and stimulation come from the inherent physical properties of the interface and are not issued by the system's haptic generator; active interfaces implement a haptic generator to provide user with the designed kinaesthetic and tactile effects.

We will present some of the most important devices present in the literature following these two categories and provide a threefold taxonomy for the *active tactile* case.

#### **10.2.2.1 Passive Kinaesthetic Feedback**

Passive kinaesthetic feedback and stimulation are inherent to the physical characteristics of the controller, and do not require any externally synthesized signal.

O'Modhrain and Essl developed three DMIs that implement passive kinaesthetic feedback. The Pebble Box and the Crumble Bag [41] were used to control an eventbased granular synthesizer: the Pebble Box consists of a box filled with different-sized pebble stones and a microphone that picks up the noise produced by the collisions between pebbles. The kinaesthetic feedback offered by the interface comes from the physical properties of the pebbles themselves, and the impact sounds act as triggering events on the granular synthesizer. The Crumble Bag follows the same patter, and it is aimed to take advantage of natural "grabbing gestures." A fabric bag is filled with different materials that provide haptic feedback, and a small microphone in the bag provides the necessary event triggers to the algorithm. The Scrubber [14] also implemented the same approach: an eraser embedded with a force sensor and two microphones were used to control the synthesis of friction sounds, synthesized by means of granular or wavetable synthesis. The haptic feedback again was directly issued by the manipulation of the device dragged along a surface.

Sinyor and Wanderley [47] developed the Gyroyre, a handheld controller based on a spinning wheel, in which the kinaesthetic feedback comes directly from the dynamic properties of the system. The mapping and synthesis algorithm are designed to take advantage of the haptic feedback, and the interface can be used for different musical applications, sequencing or modifying effects' parameters.

#### **10.2.2.2 Active Kinaesthetic Interfaces**

Active kinaesthetic feedback is the response of the controller to the user's actions, usually by means of synthesized signals supplied into motors or actuators, which stimulate kinaesthetic receptors. This is most commonly referred to as force feedback.

The earliest example of a force-feedback device specifically developed for musical applications is probably the Transducteur Gestuel Rétroactif (TGR) developed at ACROE, whose development is described in Sect. 8.3. This device was recently used by Sinclair et al. [46] to investigate velocity estimation methods in the haptic rendering of a bowed string.

Another classical example is the Moose, developed by O'Modhrain and Gillespie [42], consisting of a plastic puck that the user can manipulate in a 2D space, which is attached to flexible metal bars, connected to linear motors. Two encoders sense the movements of the puck, and the motors provide the correspondent force feedback. The device was used in a bowing test, using a virtual string programmed in Synthesis ToolKit (STK) [10], where the presence of friction between the bow and the string was simulated using the haptic device.

The vBow by Nichols [40] is a violin-like controller that uses a series of servomotors and encoders to sense the movement of a rod, acting as the bow, connected to a metallic cable. In its last incarnation, the vBow is capable of sensing moment in 4-DoF and producing haptic feedback accordingly.

More recently, Berdahl and Kontogeorgakopoulos [2] developed the FireFader, a motorized faders using sensors and DC motors to introduce musicians to haptic controllers. Both the software and hardware used for the project are open-source, allowing musicians to customize the mapping of the interface to their specific needs. Applications of the device are described in Chap. 9.

#### **10.2.2.3 Passive Tactile Interfaces**

Passive tactile is a form of primary feedback, which leverages the use of different types of materials in a controller for musical expression. The properties of these materials (e.g., stiffness, flexibility) can affect the ergonomics of the instrument and its feel in the user's hands.

As an example, the Meta-Instrument [11] has the form of a partial exoskeleton embedded with buttons that the performer uses to trigger samples and events in the sound; the performer's gestures are captured via sensors in the arms and mapped to various effects. The buttons embedded in the controller are covered in a layer of foam, providing the user with immediate passive feedback about the level of pressure applied.

#### **10.2.2.4 Active Tactile Feedback and Stimulation: A Taxonomy for Musical Interaction**

Active tactile feedback and stimulation are the main focus of this chapter, and for this reason we provide a more in-depth analysis of the related literature, as well as an updated taxonomy, based on Giordano and Wanderley [19], which will help categorize examples in this field.

We propose a classification identifying in active tactile feedback and stimulation three different categories according to the function that the tactile effects have in the interface design: *tactile notification*, *tactile translation*, and *tactile languages*.

#### Tactile Notification

The most straightforward application of tactile stimulation is intended for notifying the users about events taking place in the surrounding environment or about results of their interaction with a system. The effects designed for this kind of applications can be as simple as single, supra-threshold stimuli<sup>2</sup> aimed at directing users' attention, but they can also be more complex, implementing temporal envelopes and/or spatial patterns.

Michailidis and Berweck [37] and Michailidis and Bullock [38] have explored solutions to provide haptic feedback in live-electronics performance. The authors developed the Tactile Feedback Tool, a general-purpose interface using small vibrating motors embedded in a glove. The interface gave musicians information about the successful triggering of effects in a live-electronics performance, using an augmented trumpet or a foot pedal switch. This device leverages the capacity of the tactile sense to attract users' attention, while not requiring them to lose focus on other modalities, which would have been the case with the use of onstage visual displays.

<sup>2</sup>Stimuli whose intensity exceeds vibrotactile thresholds and are thus perceivable (see Sect. 4.2).

Van der Linden et al. [49] implemented a whole-body general-purpose vibrotactile device. The authors used a motion capture system and a suit embedded with vibrating motors distributed over the body to enhance the learning process of bowing for novice violin players. A set of ideal bowing trajectories was computed using the motion capture system; when practicing, the players' postures would be compared in real time with the predefined ideal trajectories. If the distance between any two corresponding points in the two trajectories exceeded the threshold value, the motor spatially closer to that point would vibrate, notifying the users to correct their posture. The authors conducted a study in which several players used the suit during their violin lessons. Results showed an improved coordination of the bowing arm, and participants reported an enhancement in their body awareness produced by the feedback.

A similar solution was developed by Grosshauser and Hermann [21], which used a vibrating actuator embedded in a violin bow to correct hand posture. Using accelerometers and gyroscopes, the position of the bow could be compared in real time to a given trajectory, and the tactile feedback would automatically activate to notify the users about their wrong posture.

#### Tactile Notification

With tactile translation, we refer to two separate classes of applications: One class implements sensory substitution techniques to convey to the sense of touch stimuli which would normally be addressed to other modalities; the other class simulates the haptic behavior of other structures whose vibrational properties have previously been characterized.

#### *Sensory Substitution*

The field of sensory substitution has been thoroughly investigated since the beginning of the last century. In 1930, von Békésy started investigating the physiology behind tactile perception by drawing a parallel between the tactile and the auditory channels in terms of the mechanism governing the two perception mechanisms [53]. A thorough review of sensory substitution applications can be found in Visell [52]. In a musical context, several interfaces have been produced with the aim of translating sound into perceivable vibrations delivered via vibrotactile displays. *Crossmodal mapping* techniques can be utilized to perform the translation, identifying sound descriptors to be mapped to properties of vibrotactile feedback.

Karam et al. [27] developed a general-purpose interface in the form of an augmented chair (the Emoti-Chair) embedded with an array of eight speakers disposed along the back. The authors' aim was to create a display for deaf people to enjoy music through vibrations. They developed the Model Human Cochlea [26]—a sensory substitution model of the cochlear critical band filter on the back—and mapped different frequency bands of a musical track, rescaled to fit into the frequency range of sensitivity of the skin (see Sect. 4.2), to each of the speakers on the chair. In a related study, Egloff et al. [12] investigated people's ability to differentiate between musical intervals delivered via the haptic channel, finding that on the average smallest perceptible difference was a major second (i.e., two semitones). It was also noted that results vary widely due to the sensitivity levels of different receptive fields across the human body. Thus, care must be taken when designing vibrotactile interfaces intended to be used as a means for sensory substitution.

Merchel et al. [36] developed a prototype mixer equipped with a tactile translation system to be used by sound recording technicians. A mixer augmented with an actuator would allow the user to recognize the instrument playing in the selected track only by means of tactile stimulation: A tactile preview mode would be enabled on the mixer, performing a real-time translation of the incoming audio. Preliminary results show that users were able to recognize different instruments only via the sense of touch; better performance was obtained for instruments producing very low-frequency vibrations (bass) or strong rhythmical patterns (drums). A similar touch screen-based system and related test applications are described in Chap. 12.

#### Tactile Stimulation

In tactile stimulation applications, the vibrational behavior of a vibrating structure is characterized and modeled so as to be able to reproduce it in another interface. Examples in this category include physical modeling of the vibrating behavior of a musical instrument, displayed by means of actuators.

A DMI featuring tactile stimulation capability is the Viblotar by Marshall [35]. The instrument is composed of a long, narrow wooden box equipped with sensors and embedded speakers. Sound is generated from a hybrid physical model of an electric guitar and a flute programmed in the Max/MSP environment. During performance, the instrument rests on the performer's lap or on a stand. One hand manipulates a long linear position sensor and matching force sensitive resistor (FSR) underneath to "pluck" a virtual string. The location, force, and speed of the motion are mapped to frequency, amplitude, and timbre parameters of the physical model. The other hand operates two small FSRs which control pitch bend up and down. The sound output from the Viblotar can be redirected to external speakers, hence allowing the embedded speakers to function primarily for generating vibrotactile feedback instead of sound output. In this configuration, the sound output is split, with one signal sent directly to the external speakers and another routed through a signal processing module that can produce a variety of customized vibrotactile effects such as compensating for frequency response of loudspeakers, simulating the frequency response of another instrument or amplifying the frequency band to which the skin is most sensitive [34].

#### Tactile Languages

Tactile languages are an attempt to create compositional languages solely addressed to the sense of touch, in which tactile effects are not just simple notifications, issued from the interaction with a system, but can be units or icons for abstract communication mediated by the skin.

An early example of tactile language is the "vibratese," proposed by Geldard [16], who aimed at creating a complete new form of tactile communication delivered by voice coil actuators (see Sect. 13.2). Parameters for defining building blocks for the language would be elements such as frequency, intensity, and waveform. A total of 45 unit blocks representing numbers and letters of the English alphabet were produced, allowing for expert users to read at rates up to 60 words per minute.

More recently, much research on tactile languages has been directed toward the development of tactile icons. Brewster and Brown [6] introduced the notion of *tactons*, i.e., tactile icons to be used to convey non-visual information by means of abstract or meaningful associations, which have been used to convey information about interaction with mobile phones [8]. Enriquez and MacLean [13] studied the learnability of tactile icons delivered to the fingertips by means of voice coil-like actuators. By modulating frequency, amplitude and rhythm of the vibration, they produced a set of 20 icons, which were tested in a user-based study organized in two sessions, two-weeks apart. Participants recognition rates reached 80% in the first session after 10 min of familiarization with the system and more than 90% during the second session.

In a musical context, attempts to create compositional languages for the sense of touch can be found in the literature. Gunther [22] developed the Skinscape system, a tactile compositional language whose building blocks varied in frequency, intensity, envelope, spectral content of vibrations, and spatial position on the body of the user. The language was at the base of the Cutaneous Grooves project by Gunther and O'Modhrain [23], in which it was used to compose a musical piece to be accompanied by vibrations delivered by a custom-built set of suits embedded with various kinds of actuators.

In terms of tactons, we are not aware of any study evaluating their effectiveness in the context of music performance and practice. This is the object of the remainder of this chapter, where we present the design and evaluation of tactile icons for expert musicians.

## **10.3 Development and Evaluation of Tactile Icons for Music Performance**

Our focus in this section will be on the development of a tactile language and its application in designing a language of vibrotactile cues to be used by musicians. We present the design process behind the tactons we developed, and present a methodology for evaluating their effectiveness when delivered via tactile-augmented garments. Our work was conducted in the context of *Musicking the Body Electric*, a four-year (2014–2018) multidisciplinary project involving researchers from the fields of haptics, music technology, music education, composition, and wearable electronics.<sup>3</sup>

The ultimate goal of the project is to develop tactile-augmented suits and a language of tactons [7] to be used as building blocks for a *wearable score* system. The language will allow composers to convey musical information via tactile stimulation

<sup>3</sup>Principal investigators: Sandeep Bhagwati (Matralab, Concodia University, Montreal), Marcelo M. Wanderley (McGill University, Montreal), Isabelle Cossette (MPBL, McGill Univesrity), Joanna Berzowska (XS Labs, Concordia University); funded by the Social Sciences and Humanities Research Council of Canada.

in the context of a music performance in which musicians are free to walk in the performance space. The augmented garments will be able to sense the location of the musicians in the performance space and also the position of musicians relative to one another. This, for instance, would allow each of the suits to be aware of the proximity of other musicians in the room and cue them to play a given section of the piece by delivering the corresponding tactile icon.

## *10.3.1 Hardware and Software*

The work we present is the result of the first tests conducted on two specialized garments produced for the project: an augmented belt embedded with six vibrating actuators and an elastic band embedded with a single actuator that could be worn around an arm or leg. These garments were developed taking advantage of the hardware and software we contributed to create for Ilinx, a multisensory art installation featuring a whole-body suit embedded with vibrating actuators [18].

The garments created for Ilinx feature a custom-designed Arduino-compatible board embedded with motor drivers and a Serial Peripheral Interface (SPI) bus. Each board can control up to six actuators independently and is connected to a BeagleBone Black (BBB)<sup>4</sup> minicomputer via an Ethernet to SPI adapter. The BBB implements an Open Sound Control (OSC) parser which receives control commands from a Maxbased synthesizer via a wireless network, and dispatches the message to the correct board and actuator via SPI.

Solarbotics VPM25 actuators were used for the garments. This ERM type (see Sect. 13.2) of actuator was chosen for its ready availability, low cost, and simple design and had previously been characterized for both their physical and perceptual properties [15].

The wearable designers involved in the project (Joanna Berzowska and Alex Bachmayer, XS Labs, Concordia University) produced the first specialized garment for us to test: a tactile-augmented belt with six equally spaced ERM actuators (Fig. 10.2). The choice of a belt as the first garment to be designed was guided by several reasons: The placement of the actuators on a circle around the user's waist allowed for more flexibility in terms of tactile effects design; more practically, a belt provides an easier fit compared to leggings or sleeves, for instance [48, 50].

A second garment was also introduced, consisting of a single actuator mounted on an adjustable band made of stretchable fabric, which could be easily worn on body parts such as wrist, upper arm, or ankle.

<sup>4</sup>https://beagleboard.org/black (last accessed on December 17, 2017).

<sup>5</sup>https://solarbotics.com/product/vpm2/ (last accessed on December 17, 2017).

**Fig. 10.2** Augmented belt embedded with six vibrating actuators (garment design and manufacturing by J. Berzowska and A. Bachmayer—XS Labs, Concordia University)

## *10.3.2 Symbolic and Musical Tactons: Design and Evaluation*

In the early phase of the project, our approach consisted in designing two sets of tactons, to be reproduced, respectively, by the belt and the band. The former would be used to convey *symbolic* tactons, i.e., abstract patterns that musicians would need to learn and associate with specific musical elements, for instance sections of a score, chords. The latter would deliver instead *musical* tactons, i.e., tactons which carry a unique musical meaning, attached to the temporal properties of the tacton itself.

## **10.3.2.1 Symbolic Tactons**

We identified three different dimensions defining the tacton design space associated with the six-actuator belt:


For the design of the symbolic tactons, we applied a heuristic approach:We defined several geometric patterns which we hypothesized would feature unique characteristics, making them easily distinguishable from one another; we then implemented these patterns, together with preliminary global and individual temporal properties, on a Max-based tactile sequencer we programmed to control the belt; a music pedagogy doctoral researcher (Audrey-Kristel Barbeau) would then test the icons and provide immediate feedback to allow us to proceed to another iteration of the design process.

204 M. Giordano et al.

**Fig. 10.3** Final set of ten symbolic icons developed for the belt (diagram courtesy of A.-K. Barbeau). Each black dot represents one actuator. The hexagon shapes represent the actuators disposed around a user's waist, with the top two actuators corresponding to the person's front. Icons 1–4 feature a sequence of actuations which follow the direction indicated by the arrows. For icons 5–10, connected dots represent simultaneous activation of the corresponding actuators, with solid lines happening first, followed by dashed and then dotted lines. Each actuation lasts 200 ms, as per haptic envelope definition, and for each icon the pattern is repeated twice with a 300 ms interval between repetitions

This process lasted over several weeks, after which we finalized a set of ten tactons, depicted in Fig. 10.3. Each of the tactile icons consists of two repetitions of the same pattern which are separated by a fixed time interval. The tactons have a total duration which varies from 1.5 to 2.7 s. For the individual temporal properties, we chose a fixed envelope for all the actuations which features 50 ms of attack, 150 ms of sustain at maximum intensity, and no release time (see Fig. 10.4). We decided to keep the vibrotactile envelope parameters fixed for this initial phase of the project to facilitate the tactons' learning phase. These tactile icons were proposed to undergraduate music students—a saxophone player (performer 1) and a guitar player (performer 2)—who were the participants for the ensuing evaluation sessions.

The symbolic tactons we designed for the belt do not carry any musical or other meaning per se, and need to be learned by the performers to be proficiently used to convey musical information. These icons can be mapped to several musical functions, such as chords or sections of a piece, and these mappings also need to be mastered by musicians to be correctly interpreted.

(a) The crescendo tacton is achieved by means of exponentially increasing the duty cycle from 20% (perceptual threshold) to 100% over 2000 ms.

(c) The staccato tacton is obtained by presenting three, 100 ms long vibrations at 100% duty cycle, with a 100 ms interval between each peak.

(b) The envelope for the decrescendo tacton goes from 100% to 20% duty cycle over 2000 ms, by using a negative exponential function.

(d) The legato tacton features 2 periods of a scaled sine wave going from 20% to 100% over 1000 ms.

**Fig. 10.5** Schematization of the envelopes of the four musical tactons developed for the singleactuator band

#### **10.3.2.2 Musical Tactons**

While the symbolic tactons were designed by first creating geometric and temporal patterns for the vibrotactile stimuli which could later be mapped arbitrarily to musical functions, design of musical tactons for the single-actuator band took the opposite approach. For these, we started by determining the set of musical information this actuator would deliver. From experiences we gathered in our previous work [15], we hypothesized that a single-actuator configuration could be used to provide tempo cues, as well as information about articulation and dynamics.

Using the heuristic approach based on iterative feedback from A.-K. Barbeau, we designed a set of four musical tactons associated with *crescendo*, *decrescendo*, *staccato*, and *legato*, respectively, which are shown in Fig. 10.5. These tactons contained a musical meaning attached to the temporal properties of the tacton itself and would ideally require a minimal effort to be correctly interpreted.

#### **10.3.2.3 Preliminary Evaluation**

We conducted a preliminary evaluation of both symbolic and musical tactons' design with our two musicians, who performed a series of musical tasks we associated with each of the icons. It was important for us to evaluate the learnability and recognition rate of the tactons in the context of music performance in order to establish if musicians actively engaged in a musical task could reliably recognize and respond to the given tactile icons.

We performed two testing sessions, two weeks apart, following a methodology similar to the one reported in [13]. The musicians had 20 min per session to familiarize themselves with the tactons. Subsequently, they were asked to perform two recognition tasks. In task 1, they experienced a series of tactons and verbally reported the name or number of the tacton they thought they had perceived. In task 2, the musicians were given a score, shown in Fig. 10.6, and asked to perform the melody associated with the perceived icons. The melodies were composed to be easy to sight-read and perform. In the first session only symbolic tactons were tested, while in the second session we tested both symbolic and musical tactons. Performances were audio-recorded and subsequently analyzed to determine recognition rates of the tactile icons in both sessions.

#### Session 1

Two repetitions of task 1 were performed 10 min apart. The results are depicted in Fig. 10.7a and show the average recognition rate of twenty randomly ordered tactons for each of the two repetitions. For the first trial, the two musicians correctly identified 86 and 77% of the tactons, respectively. In the second repetition, both performers achieved 88%.

For task 2, we provided the musicians with the score shown in Fig. 10.6. This time we asked them to play the melody corresponding to the perceived tactile icon. The musicians were free to play at the tempo they desired. Fifteen randomly ordered icons were tested, and a new icon would be delivered via the belt while the musician was playing the half note ending the previous melody. Task 2 was repeated three times, 10 min apart, and the results are depicted in Fig. 10.7b. The performers reached, respectively, a 92 and 79% recognition rate for the first trial, 92 and 86% for the second trial, and 88 and 71% for the last trial. It is notable that the results declined for both performers in the third trial, factors for which we discuss in Sect. 10.3.2.4.

#### Session 2

A second session took place two weeks after session 1, testing both symbolic and musical tactons. Following the previously described protocol, we performed task 1 first, whose results are depicted in Fig. 10.8a.

For task 2, the musicians wore the belt and the single-actuator elastic band on their left upper arm. A symbolic icon would be delivered via the belt, followed by a musical icon from the single actuator. The musicians were asked to play the corresponding melody following either the articulation or the dynamics indicated by the musical tacton. Results are shown in Fig. 10.8b. For the symbolic icons, the first performer reached a recognition rate of 87% in the first trial, 86% in the second, and 70 and 78% in the third and fourth, respectively. A similar trend can be observed for the musical icons, with a 100% recognition rate in the first repetition, followed by 92, 82, and 88% in the last three trials. The second musician performed less well in this task, reaching a 78% recognition rate for symbolic tactile icons in trial one, 71% for trial two, and 76 and 77% for trials three and four, respectively. For the musical tactons, only 25% of the tactile icons were correctly recognized in trial one, 66% in trial two, and 77 and 57% in trials three and four, respectively.

**Fig. 10.6** Set of 10 simple melodies, composed by A.-K. Barbeau and associated with the ten symbolic tactile icons. The performer would feel one of the tactons on the augmented belt and perform the corresponding melody

#### **10.3.2.4 Musician's Feedback and Discussion**

The two testing sessions with the undergraduate musicians show several patterns: Performers' recognition rate in both sessions was consistently over 80% for task 1, even after only 20 min of practice with the belt (consistent with findings in Enriquez and MacLean [13]). This suggests that for both the musical and the symbolic tactons, we were able to design learnable and distinguishable tactile icons.

When looking at the data for task 2, in both sessions we can observe important differences between the two performers. Performer 1 consistently achieved better

(b) Task 2: Play melody corresponding to the perceived tacton as indicated on the score in Fig.10.6.

**Fig. 10.7** Recognition rates for session 1 for both task 1 and task 2. Recognition rate is consistently around 80% for both performers

results than performer 2, who afterward reported that the task could become quickly overwhelming, especially in the second session. This suggests that the complexity of the task prevented performer 2 from simultaneously paying attention to both types of tactile icons while reading and playing the melodies on the instrument. Performer 2's performance nonetheless improved over time, as visible in Fig. 10.8b, going from a 25% recognition rate for the musical icons in trial one to almost 80% in trial three.

Participant 1 scored above 80% in most of the tasks across the two sessions, and two trends can be identified: For both sessions, performer 1's performance in the musical task decreased in trial three, compared to the first two trials. This might be due to the presence of adaptation effects which would decrease the sensitivity to the tactile icons. The musician stated that the tasks were not too demanding and that the icon design allowed to easily differentiate the tactile effects.

Overall, the variation between the two participants could be caused by different levels of proficiency on their instrument and ability to sight-read, despite their similar self-assessed musical expertise: Participant 1 was very confident in the sight-reading and performance of the melodies we proposed, while for participant 2 this task proved to be quite demanding, as demonstrated by the frequent hesitation in performing the given melodies which can be heard in the audio recording of the testing sessions. The different postures adopted by the two musicians when playing the saxophone and the guitar, respectively, could also be partly responsible for the variation between the two participants, but this aspect would require an investigation conducted on a larger group of musicians. Additionally, the limited number of repetitions and subjects makes it difficult to draw definitive conclusions about significant trends over repetitions, as randomness may have had an impact on the results.

on the score in Fig. 10.6.

**Fig. 10.8** Recognition rates for session 2 for both task 1 and task 2. Both symbolic and musical tactons were tested in this session. Results show recognition rates consistently around 80% for participant 1, while participant 2 performed less well in task 2

The observations reported above indicate that a satisfying degree of tactile icon recognition can be reached for both musical and symbolic tactons during the performance of a musical task, provided a high degree of confidence and expertise on the performer's side. While all the musical tactons were equally well recognized during the two testing session, symbolic tactile icons 5 and 6 were the most problematic ones in terms of recognition rates. Tacton 5 would often be confused with tacton 9 since, as reported by performer 1, the vibration coming from the two actuators on the sides would sometimes go unnoticed. This could be due to lower skin sensitivity in the waist area, which, combined with its peculiar geometrical pattern, made tacton 6 also difficult to recognize at times.

Ultimately, our results confirm that the transparency of a tacton [32] is not an absolute property of the tactile icon itself, but is very much influenced by the global context in which tactile information is being transmitted to users and to their available cognitive resources [44].

## *10.3.3 Implementation into Live Performance*

Following the evaluation sessions, the wearable score system was put into practice with a performance of *40 Icons about Art/Music* composed by Sandeep Bhagwati and performed by trombonist Felix Del Tredici.6 The piece was the first étude to be composed for the augmented belt [17] and consisted of ten random repetitions of four musical tasks, each associated with one of the four symbolic icons chosen from the ten described in Sect. 10.3.2.1. In rehearsals, we worked with the performer to identify the set of four tactons to be used for the piece, which led to the selection of tactons 2, 3, 4, and 6 in Fig. 10.3. During the performance, a tacton would be delivered to the performer via the belt. He then had to execute the associated task once the corresponding tactile icon was recognized.

Following the performance, we asked the performer about his experience during the piece. He found the four icons easy to recognize, while admitting that it took a considerable effort to pay attention to the vibrations coming from the belt while performing the musical tasks.

## **10.4 Conclusions**

In this chapter, we presented a literature review of the use of haptic technology in music performance. Our focus was the design and implementation of solutions incorporating active vibrotactile feedback and stimulation. We presented a threefold taxonomy of applications in this domain and provided examples for each one of the categories we defined: tactile notification, translation, and languages.

<sup>6</sup>http://www.felixdeltredici.com/ (last accessed on Dec. 17, 2017).

In the second part of the chapter, we focused on tactile languages and presented the results achieved in *Musicking the body electric*, a multidisciplinary project in which we contributed by designing and evaluating the use of tactile icons to convey score information to expert musicians. Several researchers have evaluated the use of such icons. To our knowledge, no previous evaluation of the use of this type of tactile communications has been performed in the context of musical interaction. For our purposes, it was important to evaluate our approach in the performance of authentic musical tasks. The evaluation we presented shows that our design paradigms for the tactile icons allow for recognition rate consistently around 80% after 20 min of familiarization with the system. The musical tasks we proposed, on the other hand, seem to impact these recognition rates in a way that is dependent on the users' musical expertise, and the effect of learning is visible already during a single session.

Work continues on *Musicking the body electric* in all areas. Bhagwati composed *Fragile Disequilibria* [3], a piece for solo trombone and four spectators, for which new suit prototypes were designed with multiple ERM motors placed along the arms and legs, across the back and around waist. New materials and technologies are also being tested to design a more robust and flexible platform for haptic garments that can be adapted to a number of different performance contexts. In addition to prototypes developed specifically for this project, a new modular wireless tactile system has also been introduced, where an array of self contained, single-actuator devices called Vibropixels can be placed flexibly on a garment, allowing them to be moved or reconfigured depending on the application [24, 25]. Finally, new compositions are being created for the suits to explore some of the novel possibilities afforded by a vibrotactile score system, most notably the expanded use of physical space and movement among performers.

**Acknowledgements** We would like to thank the Social Sciences and Humanities Research Council (SSHRC) of Canada and the Natural Sciences and Engineering Research Council (NSERC) of Canada Discovery grant for supporting this research. Special thanks to Sandeep Bhagwati, Isabelle Cossette, Audrey-Kristel Barbeau, Deborah Egloff, Joanna Berzowska, Alexandra Bachmayer, and all the collaborators to the *Musicking the body electric* project.

## **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Chapter 11 Haptics for the Development of Fundamental Rhythm Skills, Including Multi-limb Coordination**

## **Simon Holland, Anders Bouwer and Oliver Hödl**

**Abstract** This chapter considers the use of haptics for learning fundamental rhythm skills, including skills that depend on multi-limb coordination. Different sensory modalities have different strengths and weaknesses for the development of skills related to rhythm. For example, vision has low temporal resolution and performs poorly for tracking rhythms in real time, whereas hearing is highly accurate. However, in the case of multi-limbed rhythms, neither hearing nor sight is particularly well suited to communicating exactly which limb does what and when, or how the limbs coordinate. By contrast, haptics can work especially well in this area, by applying haptic signals independently to each limb. We review relevant theories, including embodied interaction and biological entrainment. We present a range of applications of the Haptic Bracelets, which are computer-controlled wireless vibrotactile devices, one attached to each wrist and ankle. Haptic pulses are used to guide users in playing rhythmic patterns that require multi-limb coordination. One immediate aim of the system is to support the development of practical rhythm skills and multilimb coordination. A longer-term goal is to aid the development of a wider range of fundamental rhythm skills including recognising, identifying, memorising, retaining, analysing, reproducing, coordinating, modifying and creating rhythms—particularly multi-stream (i.e. polyphonic) rhythmic sequences. Empirical results are presented.

S. Holland (B)

Music Computing Lab, Centre for Research in Computing, The Open University, Milton Keynes MK76AA, UK e-mail: s.holland@open.ac.uk; simon.holland@open.ac.uk

A. Bouwer

#### O. Hödl

Cooperative Systems Research Group, Faculty of Computer Science, University of Vienna, Währingerstraße 29/S6, Vienna 1090, Austria e-mail: oliver.hoedl@univie.ac.at

© The Author(s) 2018

Faculty of Digital Media and Creative Industries, Amsterdam University of Applied Sciences, Wibautstraat 2-4, 1091 GM Amsterdam, The Netherlands e-mail: a.j.bouwer@hva.nl

S. Papetti and C. Saitis (eds.), *Musical Haptics*, Springer Series on Touch and Haptic Systems, https://doi.org/10.1007/978-3-319-58316-7\_11

We reflect on related work and discuss design issues for using haptics to support rhythm skills. Skills of this kind are essential not just to drummers and percussionists but also to keyboards' players and more generally to all musicians who need a firm grasp of rhythm.

## **11.1 Introduction**

The role of the sense of touch in musical skills and the use of haptic devices to support musical activities are explored throughout this book. In this chapter, we focus on the use of haptics for learning fundamental rhythm skills, in particular skills typically learned though multi-limb coordination. The motivation for using haptics for this purpose relates to the different strengths and weaknesses of different sensory modalities. Vision is poor at tracking rhythms in real time, due to its lack of fine temporal discrimination, while hearing is considerably more accurate. However, when learning to recognise and play multi-limbed rhythms, neither hearing nor sight is well suited to communicate which limb does what and when, or how the limbs coordinate to form complex patterns. This is an area in which haptics can excel, by applying separate haptic signals to individual limbs. With this goal in mind, we have developed a system called the Haptic Bracelets and explore several applications in this chapter. The Haptic Bracelets are wearable haptic devices designed to help people learn multiple simultaneous (i.e. polyphonic) rhythmic patterns. Although the bracelets are fundamentally simple in conception, and although they make use of elements common in other haptic systems, in some respects they occupy a little explored part of the design space. In particular, they require different aspects of human cognition, perception and motor skills to be taken into account when considering some of the opportunities and affordances they present.

In simple terms, the bracelets are wearable haptic devices designed to be worn by an individual (or, for some applications, by pairs of individuals, or groups) on all four limbs (two wrists and two ankles). Each bracelet contains (Fig. 11.1): a highresolution inertial measurement unit (IMU)1; precise, fast acting *vibrotactiles*<sup>2</sup> with a wide dynamic range; a processor; and a Wi-Fi module (RN-XV Wi-Fly3). Each set of four bracelets is coordinated by a master processor, typically on a smartphone or laptop. Where more than one user is involved, master processors communicate with one other.

In terms of basic operation, the bracelets can sense what actions a drummer is making with each limb and when. This can also be directly communicated from one drummer to another, as explored below. The bracelets have a range of musical applications, which we will consider in depth in this chapter, including the following:

• The Haptic iPod;

<sup>1</sup>Inertial measurement units typically combine accelerometers, gyroscopes and magnetometers.

<sup>2</sup>In the present chapter, the term "vibrotactile" is often used as a noun to mean "vibrotactile actuator".

<sup>3</sup>A now discontinued Wi-Fi solution.

**Fig. 11.1** A Haptic Bracelet, displaying the internals


The above applications can be valuable not just to drummers, but to any musicians who need a firm grasp on how rhythmic patterns interlock. Arguably, this applies to all musicians, but especially to those who play polyphonic instruments or who have complex rhythmic interactions with other players.

Interestingly, the Haptic Bracelets have also found applications in the digital health domain, particularly in rehabilitation for sufferers from a range of movementrelated neurological conditions including stroke, Parkinson's, Huntingdon's and brain trauma [1–4]. However, this is mostly outside of the scope of this chapter.

There is a wealth of existing research on the use of haptics for communicating different kinds of information, for example notifications [5], posture improvement [6, 7], tempo synchronisation among musicians [8, 9] and more generally for conveying information about different categories of phenomena such as forces [10], shapes, textures, moving objects, patterns and sequence ordering, as reviewed in [11]. Conversely, there is rather less research on the use of haptics for communicating precise temporal patterns, especially multiple simultaneous temporal patterns. Work in broadly related parts of the design space is reviewed in Sect. 11.5.

In order to understand how people perceive and deal with rapid temporal patterns, it helps to be aware of theories of biological entrainment and neural resonance theory—both of which are reviewed in the next section.

## **11.2 Motivation and Theoretical Background**

The motivation and theoretical background for the Haptic Bracelets is drawn from a variety of sources, as we explore below. The original motivation for the bracelets came from music education, specifically Emil Dalcroze's Eurhythmics. Theoretical insights came from research in music perception by Bamberger [12], Lerhahl and Jackendoff [13], and others, as well as from work in ethnomusicology by Arom [14]. Once various prototype versions of the bracelets were built [2, 3, 11, 15, 16], research from cognitive science, particularly theories of human biological entrainment and neural resonance theory, proved invaluable in understanding key aspects of how humans interact with the bracelets.

## *11.2.1 Dalcroze Eurhythmics*

The Swiss music educator Emil Dalcroze (1865–1950) noticed that many of his students seemed to read and play music notation stiffly, as an abstract activity, with little evidence of feeling the rhythms in their bodies [17]. By contrast, when observing musicians in Algeria, he noticed that musicians seemed to feel music in their whole bodies, engaging more deeply with complex rhythms. Dalcroze devised a wide range of physical musical games, culminating in the educational system known as Dalcroze Eurhythmics,4 still widely influential and in use today [17]. Amongst other things, this involves students listening to music while moving arms and legs independently, to mirror movement in different simultaneous streams in the music.

## *11.2.2 Metrical Hierarchies and Polyrhythms*

Further theoretical insights come from research in music perception and musicology, reflecting longstanding insights by musicians. To musical novices, musical rhythm may seem like "one event after another". However, as Lerdahl and Jackendoff and other theorists demonstrated, nearly all Western music is governed by metre. Metre may be viewed as a series of hierarchically coordinated and exactly synchronised temporal layers—each typically highly regular—with interesting exceptions [18]. While there are vital other aspects to rhythm, for example figures, duration, dynamics, accents and syncopation, nevertheless this means that many aspects of coordinating rhythm can be effectively offloaded from the cognitive system and onto the sensorimotor system by learning to assign different regular repeating patterns to each limb.<sup>5</sup> This can be learned by starting with just two limbs and then adding additional limbs. In some non-Western musical traditions, polyrhythmic organisation is used instead of hierarchical metre. In this case, the temporal layers are not organised hierarchically—however, each layer is still typically highly regular, and periodically all

<sup>4</sup>The band the Eurythmics was named after this educational approach.

<sup>5</sup>Interestingly, in some special cases, a useful educational strategy can be to shift the memorisation load for multi-stream rhythms in the other direction, for example from limb movement onto language processing, e.g. by using linguistic mnemonics [11].

of the layers still reach synchronisation points [14]. Consequently, the same principles about moving load from the cognitive system onto the sensorimotor system are relevant.

## *11.2.3 Cognitive Science: Entrainment and Neural Resonance*

In addition to domain-specific theories from music education, music psychology and musicology, various theories from cognitive science help to cast light on the Haptic Bracelets. The most widely applicable of these are the theories of embodied interaction [19] enactive cognition [20] and sensory motor contingency [21]. Broadly speaking, these theories focus not just on purely mental processes, but on the physical enaction of target skills and on sensorimotor interactions that engage the whole body and give participants multi-sensory feedback on how their actions affect their surroundings. However, there are two theories from cognitive science that have much more specific relevance to learning multiple simultaneous rhythmic patterns, namely the theories of biological entrainment and neural resonance, considered below.

*Entrainment* is a term, originally from physics, to describe how two or more physically connected rhythmic processes interact with each other in such a way that they adjust towards and eventually "lock in" to a common periodicity or phase. However, the concept has been found to have rich and unexpected applications in perception, neuropsychology and music psychology at a variety of different levels [22–24]. At the interpersonal level, musicians have a strong tendency to entrain with each other when playing. This is more interesting than it might appear on the surface, because when two or more musicians play together—despite being demonstrably in time with each other—it may be the case that they rarely or even never play notes at the same time. In the case of entrained musicians, typically what is happening is that, instead of being entrained to the musical surface, both players are entrained to a beat (part of the metre or polyrhythm) that may often be implied rather than being explicitly sounded.

To sharpen this point, most people, musicians and non-musicians alike are able to tap along metronomically to monophonic melody or rhythm. However, at many points where a tap sounds, there may be no surface event in the music. Conversely, there may be many events in the music at which no tap occurs. What is particularly interesting about this, for our purposes, is that the ability to extract a beat from an irregular musical surface appears to be an almost exclusively human ability (with notable exceptions identified below). Theorists have created diverse computational and psychological theories to try to account for this ability and for the musical ubiquity of metre and polyrhythm. The best current explanation comes from neural resonance theory.

*Neural resonance* is a theory [23, 24] proposing that humans have a specialised neural organ, which consists of a bank of actively powered oscillators with temporal periods covering the range from about 0.2 to 2 s. Many phenomena in music perception can be well explained by the way in which these hypothesised oscillators tend to entrain with sensory input. Mathematical models of this organ, based on known characteristics of neural oscillators, are able to reproduce the results of human tapping experiments well, not just for metrical rhythms but also for polyrhythms [23]. The theory of neural resonance also helps to explain the origins of musical metre: given a simple regular external beat with frequency *f* , not just the neural oscillator with frequency *f* will entrain, but also, to a lesser extent, those with frequencies *2f* , *3f* , *f/2* and *f/3*.

It was originally thought that beat extraction was unique to humans. Indeed, human neonates can extract beats at birth [24], whereas it has been evidenced by EEG studies that Macaque monkeys are *unable* to extract beats [25]. However, it was unexpectedly discovered [26] that speech-imitating birds such as the sulphurcrested cockatoo *Cacatua galerita eleonora* have expert beat extraction abilities. The v*ocal learning hypothesis*[26] suggests that rhythmic entrainment abilities may have developed evolutionarily as a by-product of vocal learning mechanisms.

## **11.3 Applications of the Haptic Bracelets**

In this section, we consider four categories of musical use of the Haptic Bracelets that we have prototyped and explored. There is some overlap, but the categories help to illuminate the design space and involve different software.

## *11.3.1 The "Haptic IPod"*

One of the many uses of the Haptic Bracelets is as part of a portable Haptic Music Player or "Haptic iPod" (Fig. 11.2). For this application, the user listens to music on a smartphone, but with the crucial feature that, in time with the music, they can feel in the appropriate limb (by vibrotactile pulses, as detailed below) which limb the drummer uses to strike a drum and when.

Users may engage with the system in a variety of ways to learn rhythms, for example by silently air drumming in time to the music, or if seated by tapping with hands and feet on nearby surfaces, or by "thigh slapping"—both recommended ways of learning rhythms [27]. It is straightforward for the system to sense virtual or actual impacts and to sonify with chosen drum sounds, should this be desired.

For those wishing to improve their sense of rhythm, or multi-limbed rhythmic skills, this Haptic iPod application has the potential to be a compelling application, for the following reasons.

In the case of drummers who are already expert, they can play what they feel (or imagine) because they have played and felt similar rhythms many times before. When hearing a rhythm being played by another drummer, they may recognise it as

**Fig. 11.2** A set of four Haptic Bracelets (lower left). Two users listening to music (right) and feeling what each limb of a drummer does and when—with the Haptic Bracelets acting as a Haptic iPod (upper right)

something they can play—often already feeling in the imagination which limb should be playing which part of the multichannel rhythm. They have typically internalised a mental model of what a drummer's arms and legs can do, by playing and listening over many years to rhythms, watching, hearing and trying to replicate what other drummers play. By contrast, for those with little or no drumming experience, the step between hearing a multichannel rhythm and learning to play it is not automatically coupled with the feel of what each limb does. This may not be a major obstacle when hearing a single channel rhythm, provided that the tempo is within limits, and the complexity of the rhythmic pattern falls within the range of what can be grasped and memorised. However, when rhythms involve multiple channels and require multiple limbs to be played in a coordinated manner, the task is much harder. In these circumstances, a lack of experience with how different limb movements can interrelate and with how different limbs are associated with different drum sounds will weaken the ability to transfer from hearing to playing. This is where haptics can offer a distinctive advantage. Coupling multichannel musical rhythms to multichannel haptics allows a person to feel the different channels in different limbs, thereby easing the transition from hearing to playing, via feeling. A similar rationale applies to all of the applications of the Haptic Bracelets considered below.

Crucially, the theory of entrainment plays a key role in this explanation. In particular, there is no suggestion that users will learn rhythms reactively by a process of stimulus response as in behavioural theories—reacting to each hit as it occurs. Such a process would not be well suited to temporal synchronisation. Rather, for typical musical materials, the streams for each limb will tend to consist of, predominantly but not exclusively, short repeating patterns or figures. Consequently, after initial listening, users are generally able to entrain to and reproduce the streams (see Sect. 11.4).

For the prototype version of this system, a laptop running a DAW<sup>6</sup> was used rather than a smartphone, and the stereo audio track had an associated manually prepared synchronised MIDI track that mirrored the drum part. The MIDI drum tracks were used to drive the vibrotactiles on the bracelets, as seen in [29]. In future versions of the system, no manual pre-processing of the audio need be involved: software for automatic drum part extraction could be used—though this would identify drums rather than limbs, which has certain limitations—this design issue is discussed in Sect. 11.5.

## *11.3.2 Drum Teaching with Haptic Bracelets*

The Haptic Bracelets operate rapidly enough to be used for real-time synchronisation between musicians. This enables a drum teacher (Fig. 11.3, right) and learner (Fig. 11.3, left) to both wear a set of bracelets, and for the learner to feel in the appropriate limb which limb the drummer uses to strike each drum, effectively in real time [3, 29]. The impacts felt by each limb are detected in fast sensors, signals are sent by Wi-Fi, and the system uses fast acting, precise vibrotactiles. Figure 11.4 shows the control interface for tap detection of each limb of the teacher's devices mapping them to the learner's bracelets. Consequently, communication delays are generally stable and under 10 ms. Taking into account the speed of sound in air, this means that synchronisation via the bracelets over a network can be as close as is generally achieved by musicians playing at distance of 3.5 m from each other—which is considered real time for most musical purposes. Depending on the quality of the Wi-Fi router and other system factors, beats can exceptionally be delayed or lost, but because the key working principle is entrainment, occasional small disturbances do not matter greatly.

Teaching in this way can be in person, over a distance, live or recorded, and oneto-one or one-to-many. Haptic Recordings can be played back later and slowed down for more detailed study, with limbs muted or isolated as needed.

<sup>6</sup>Digital Audio Workstation: A software programme for recording, editing and producing audio content.

**Fig. 11.3** A drum learner (left) feeling what his drum teacher (right) is doing with each limb in real time. This particular photograph shows a silent air-drumming exercise, without drumsticks, with the learner looking away

**Fig. 11.4** A screenshot of the software for adjusting the tap detection of one haptic bracelet set and mapping it to another set

## *11.3.3 Musician Coordination and Synchronisation*

The mode of operation, outlined above, of the Haptic Bracelets has more general applications for musician coordination and synchronisation. The Bracelets can be

**Fig. 11.5** Rudimentary two-handed rhythm: paradiddle

**Fig. 11.6** Syncopated rhythm: Cuban clave pattern

**Fig. 11.7** Polyrhythm: three against four

used to address the problem that, in complex situations, crucial cues between musicians can be missed in the recording studio or live on stage.

Specific modes of use include silent count-ins, hierarchical or polyrhythmic click tracks, confirmation of correct device operation and inter-musician communication, and coordination generally. The idea of a silent count-in is straightforward and is not new: however, in the case of complex metres or complex polyrhythms, the bracelets allow silent hierarchical or polyrhythmic count-ins that explicitly enact up to four layers of the metre or polyrhythm simultaneously to be felt in the appropriate limb. Haptic count-ins and section announcements could variously be driven by a metronome or MIDI score on a DAW, driven by a tapping foot, or by other physical actions of a musician, sounded or silent. In device feedback mode, the correct operation of foot pedals and other controllers can be confirmed by haptic feedback—a sophisticated version of this idea has been explored extensively by [28].

## *11.3.4 Teaching Multi-limb Drum Patterns by Multi-limbed Haptic Cueing*

The application of the Haptic Bracelets that we have explored most extensively is teaching multi-limb drum patterns (such as in Figs. 11.5, 11.6 and 11.7) using audio and haptic recordings, as studied in the next section.

## **11.4 Experimental Results**

In this section, we review a series of experiments carried out to test the applicability of haptics for learning rhythm skills. These experiments use a variety of technological and methodological set-ups; earlier experiments used wired systems [15, 29] and sense what drums are hit and when, whereas our later systems are fully wireless and sense which limbs move and when [3, 16].

## *11.4.1 Supporting Learning of Rhythm Skills with the Haptic Drum Kit*

Our first haptic guidance system was called the Haptic Drum Kit [15]. Its main aim was to support the learning of rhythm skills and multi-limb coordination while playing drums.

The haptic pulses sent to a particular limb indicate the exact moments at which notes should be played with that limb, on a specified part of the drum kit, i.e. hi-hat, ride cymbal, snare drum or kick drum. Because each rhythm is played repeatedly in a loop, the user can listen to and/or feel the pattern before trying to play along with one or all limbs. In other words, the aim of our design is deliberately not to orchestrate stimulus response but rather to foster entrainment.

The original Haptic Drum Kit system consists of the following: vibrotactiles attached to the wrists and ankles using velcro bands; a computer system that feeds signals to the haptic devices; a stereo audio system; and a MIDI drum kit, which is played by the person while wearing the haptic devices.

TheMIDI drum kit is connected to the computer running sequencing and recording software (Logic Pro) which allows playback as well as accurate data collection. In the study, MIDI files encoding drum patterns (known as "guide tracks") were played back by the sequencer to control the generation of audio output and synchronised haptic output. The vibrotactile output signals were generated through a programme written in Max and an Arduino board, which was connected to the actuators by wires.

Presentation was possible in one of the three following modes: audio only; audio plus haptics; or haptics only. The stereo audio system was used to play back both the sound created by playing the MIDI drum kit and the sound from the guide track, when required. In the study, the participants were also recorded on video from three different angles.

To explore what kinds of rhythmic patterns could be supported best by using haptic guidance, twenty reference rhythms were selected as stimuli, drawn from four broadly representative technical categories: (1) metrical rhythms, i.e. 8 beat and 16 beat; (2) rudimentary patterns that distribute continuous strokes across two limbs, e.g. the alternation of single and double strokes in the paradiddle (see Fig. 11.5); (3) figural rhythms, involving syncopation, based on the Cuban clave (see Fig. 11.6); and (4) polyrhythms, e.g. 2 versus 3, 3 versus 4 (see Fig. 11.7), 2 versus 5, 4 versus 5. The rhythms included patterns for two, three and four limbs.

Afterwards, a structured interview was carried out with each participant to explore their views on the Haptic Drum Kit and the three conditions used in the experiment. Of the five participants, four were beginners, while one had five years of experience drumming in rock bands and taking drumming lessons.

Although there were some interesting individual differences (see [15] for details), the results can be generally summarised as follows. All participants expressed an interest in using the Haptic Drum Kit again, and most found the system comfortable to wear. However, all participants found the audio clearer than the haptic presentation to attend to, and all found it easier to play in time with the audio than the haptic stimuli. Of the three forms of presentation (audio only, haptic only and audio plus haptic), all preferred audio plus haptic, indicating that the haptics were considered to have added value.

The vibrotactile drivers for this version of the Haptic Drum Kit (version 1) appeared to have three weaknesses for our purposes, according to feedback from the five participants in the study: (1) the haptics were not felt clearly enough, especially on the ankles; (2) the attack of the haptic pulses was somewhat blurred, making it difficult to recognise the precise timing of a note to be played; and (3) there was no relative emphasis of haptic pulses, which made it hard to clearly differentiate the beginning of the looping pattern.

## *11.4.2 Learning Multi-limb Rhythms with Improved Haptic Drum Kit*

To address the weaknesses of the first version of the Haptic Drum Kit, an improved version was developed. This second version of the Haptic Drum Kit employs four C2 tactors<sup>7</sup> as the vibrotactile devices. They use linear resonant actuators (LRAs) rather than the more common eccentric rotating mass (ERM) actuators, which allows tactors to deliver very clear haptic signals with very low start-up time (around 4 ms). Details on those actuator technologies can be found in Sect. 13.2. These are secured to the limbs using elastic velcro bands. As with the earlier version of the system, a MIDI drum kit is used to play and record the drum sounds.

An experiment was carried out using this system with 16 participants (eleven with varying degrees of drumming experience, five without) to see whether this version was more suitable for our purposes and to explore in more detail the effects of haptic guidance on learning of rhythms, for four different kinds of rhythmic stimuli that all require multi-limb coordination. These stimuli form a subset of the rhythms used in the previous study:

• Linear rudiments (e.g. paradiddle);

<sup>7</sup>https://www.eaiinfo.com/tactor-landing/ (last accessed on November 8, 2017).


After the playing sessions, questionnaires were used to gather participants' feedback on the different conditions. During subsequent analysis, the participants' performance was manually scored by an experienced percussionist in terms of accuracy and timing, and times were recorded for the moment at which a particular pattern was first attempted and when it was first played correctly.

The results of this study were very encouraging. They indicated that haptic stimuli can be used as a reasonable alternative for audio stimuli in drumming instruction for the various kinds of rhythms employed, achieving similar results in terms of learning speed, i.e. the time required to learn to play an exercise correctly. For accuracy, there were individual differences which seemed related to the participants' previous experience in drumming and playing along with metronomes.

For less experienced drummers, accuracy was highest in the haptic condition and lowest in the audio condition, while for the most experienced drummers there was little difference between conditions. Regarding timing, beginners performed best with audio plus haptics, whereas experts performed best with audio only. The data from the questionnaires showed that haptic guidance for multi-limbed drumming was generally well liked, and given a choice between audio, haptic or both audio and haptic presentation, 14 participants preferred audio plus haptic. Most participants enjoyed using the Haptic Drum Kit, found the tactors comfortable to wear, and all except one said they would like to use the system again.

Comparing different haptic devices, i.e. the vibrotactiles used in version (1) and the tactors used in version (2), the tactors provided better results, both in terms of observable performance and subjects' attitudes.

## *11.4.3 Passive Learning of Multi-limb Rhythm Skills*

To find out whether haptically supported learning of similar multi-limb rhythm skills could also take place while the learner is attending another task, away from the drums, an experiment was carried out to investigate the possibility of passive learning of rhythms while reading [11]. Fifteen people participated in the experiment (eight men and seven women), aged 15–51. Three were experienced drummers (with approximately 10 years of experience playing the drums), five had a little drumming experience, and seven had no experience with drumming.

The technology used in this study was an early version [29] of the Haptic Bracelets. For practical reasons, the system used for this study was wired and stationary, to ensure the maximum possible reliability of timing data. This version of the Haptic Bracelets employed C2 tactor vibrotactiles attached to each wrist and ankle, using elastic velcro bands. The tactors were driven by multichannel signals from a DAW.

The experimental procedure consisted of a pretest phase, a passive learning phase and a post-test phase, as follows. In the pretest phase, participants were asked to play a series of six rhythms (requiring multi-limb coordination, as in the previous study) on a drum kit, guided simply by audio recordings. These performances provided a base reference for later comparisons. During the following passive learning phase, away from the drum kit, participants were asked to carry out a 30-min reading comprehension test. Participants were asked to focus on getting the best possible scores on the comprehension test.

During the comprehension test, just two of the six rhythms from the set were haptically "played" (without audio) to each subject via the vibrotactiles attached to wrists and ankles. Different pairs of rhythms were chosen for different subjects, so that clear distinctions could be made in the next phase. Within that constraint, in order to present an adequate challenge for each subject, choices were made of more or less rhythmic complexity to reflect different levels of previous playing experience.

In each case, the two rhythms were played repeatedly, alternating every few minutes. In the post-test phase, subjects were asked to play again at the drum kit the complete set of rhythms from the pretest, including the two rhythms to which they had been passively exposed. Finally, a questionnaire was used to gain feedback from the participants about their experiences during the experiment and their attitudes towards the Haptic Bracelet technology.

The results from the participants' subjective evaluations can be summarised as follows (for detail, and the complete set of responses from which a selection is provided here, see [11]).

Most participants thought that the technology helped them to understand rhythms and to play rhythms better, and most preferred haptic to audio to find out which limb to play when. Most participants indicated that they would prefer using a combination of haptics and audio for learning rhythms to either modality on its own.

Interesting quotes from participants in response to the open question "Are there things that you liked about using the technology in the training session?" included the following, all from different participants:

It helped to differentiate between the limbs, whereas using audio feedback it is often hard to separate limb function.

Clarity of the haptics. 'seeing' the repeated foot figure in the son clave.

Being able to flawlessly distinguish between which limb to use. The audio is more confusing.

The question "Are there things that you like about the haptic playback?" resulted in responses such as the following:

It makes the playing of complex patterns easier to understand.

Easier to concentrate on the particular rhythms within a polyrhythm (than audio only).

That you could easily feel which drums you needed to play when and how quickly it went on to the next beat.

The answers from participants to the question "Are there things that you don't like about the haptic playback?" included the following:

repetition gets irritating 'under the skin'

The ankle vibrations felt weak on me and I had to concentrate hard to feel them.

Just initially strapping on the legs. [Lack of] portability.

All quotes above are selected from [11].

In other words, there seems to be room for improvement in the feeling of the haptics and the straps, especially after longer use, the inconvenience of the wires and personally adjustable strength levels for the haptic signal for each limb. The last two points have already been addressed in more recent versions of the Haptic Bracelets, which are portable, wireless, and have individually adjustable levels.

## **11.5 Related Work**

As noted earlier, there is much research on the use of haptics for communicating different kinds of musical information, for example notifications [5], posture improvement [7], tempo synchronisation [8, 9], haptic guidance or augmentation in general [30–32] (see also Chaps. 6, 8, 9, 12, 13 and Sect. 10.3) and the effect of haptic feedback on quality perception and user experience [33, 34] (see also Sect. 5.3.2.2, Chaps. 6 and 7). However, in this section we focus principally on haptics for rhythm skills, particularly, though not exclusively, as regards multiple simultaneous streams of rhythms. We will group broadly representative strands of research in this area as follows:


Having reviewed the approaches used in this work, we then compare and contrast them with modes of use of the Haptic Bracelets (as considered in Sect. 11.3). The resultant contrasts help to illuminate various design dimensions for haptics for developing rhythm skills.

One straightforward use of haptics in developing rhythm skills is as *haptic metronomes*. Recently, commercial versions of haptic metronomes have come on the market.8 Giordano and Wanderley [9] demonstrated formally that musicians can reliably follow a tempo set by a haptic metronome. This research showed that deviation from target inter-onset interval was comparable between the auditory and the tactile modality.

Several projects have applied haptics to *multiple areas of the body* for musicrelated purposes, sometimes via specialised haptic garments [35] (see also Sect. 10. 3) and even via furniture [36]. However, the emphasis in these projects is generally not on multi-stream rhythm skills. In many cases, the focus is on exploring *novel aesthetic*

<sup>8</sup>For example, the Soundbrenner Pulse http://www.soundbrenner.com and the Peterson BodyBeat Pulse https://www.petersontuners.com/shop/Metronomes/ (last accessed on November 8, 2017).

*haptic perceptual effects,* such as in the case of [37, 33]. In some projects of this kind [36], the focus is strongly on *Deaf culture,*<sup>9</sup> and on the use of crossmodal devices and *sensory substitution* [38] to convey musical information through sense of touch, particularly for the profoundly deaf. In this context, Fulford [39] has investigated the extent to which tonal intervals can be accurately communicated by touch. Jack et al. [37] have collaborated with Deaf arts activists to produce furniture that translates pitch, rhythm, loudness and timbre to whole body vibration in psychometrically well-informed ways.

Some work applying haptics to the whole body (or large parts of the body) may have some implications for improving skills related to multi-stream rhythms. An interesting example is a tension-based wearable vibroacoustic device by Yamakazi et al. [40]. This device uses a cord worn around the chest, whose tension is adjusted by DC motors directly driven by an amplified analogue audio signal. This system permits the communication of an acoustic signal with finely detailed bass clarity into the entire chest cavity. Users scored the experience favourably particularly in music with prominent bass drum parts. Although this system does not spatially separate multiple rhythms, its bass clarity may help wearers in separating low-pitched rhythm parts.

A contrasting system with clear potential relevance to skills multi-stream rhythm skills is MuSS-bits by Petry et al. [41]. Designed with deaf users in mind, this system uses wireless sensor–display pairs that map audio microphone signals more or less directly to the voltage applied to vibrotactiles, which can be attached anywhere on the body.

One strand of work has focused on haptics for temporal sequencing—particularly for monophonic rhythms and monophonic melodies—though recently the scope has widened [42, 43]. Huang et al. [44, 45] and Siem et al. [46, 47] carried out a series of studies looking at *passive learning* (i.e. learning without conscious attention) of tasks involving sequential key presses, such as typing or playing piano melodies. A lightweight wireless haptic system was developed for the purpose, with a fingerless glove containing one vibrotactile per finger. This system was used to teach sequences of finger movements to users, while they performed other tasks. A sequence of finger movements learned in this way, if subsequently repeated with the five fingers placed over five adjacent keys on a musical keyboard, serve to play a monophonic melody. Target melodies were typically restricted to five pitches, so that no horizontal movement of the hand (as opposed to vertical movement of the fingers) was needed. Sample melodies contained rests and notes of different durations. A study demonstrated that passive learning with audio and haptics combined was significantly more effective than audio only. A more recent study [47] involved passively training both hands simultaneously with material that was monophonic in the right hand but included simple repeating two note chords in the left hand. This work demonstrated that users may learn to play tunes for both left and right hand's tunes at once via passive haptic learning. The work by Grindlay [42] focused on passive learning of monophonic

<sup>9</sup>*Deaf culture* (with a capital D) refers to a set of cultural values, behaviours and traditions associated with deafness viewed as a distinctive and valuable human experience, as opposed to a disability.

drum rhythms, with a mechanical installation providing haptic guidance by automatically moving a single drumstick held by the learner. The results of this study showed that the system supported learning of rhythms which can be played with one hand.

A project that takes involuntary control of a learner's movements to extremes is the Possessed Hand [48]. This system allows control of a user's finger movements by applying electrical stimuli to the associated muscles using a belt with 28 electrode pads placed around the forearm. The makers suggest this system could be applied to musical applications, in particular learning correct hand posture for playing the piano or koto, but they mention there are issues to be considered related to reaction rate, accuracy and muscle fatigue. This research is highly unusual in terms of the test subjects' comments, which include "Scary… just scary" and "I felt like my body was hacked" [48, p. 550].

As noted earlier, we will now compare and contrast the above work with various modes of use of the Haptic Bracelets in order to illuminate various dimensions of the interaction design space for the haptic support of rhythm skills.

One such design dimension contrasts *metronomic cueing versus interpersonal rhythmic interaction*. Commercial haptic metronomes are excellent tools for practising to a beat. Like the Haptic Bracelets, they can allow several musicians wirelessly to coordinate by sharing a common haptic metronomic beat or to be coordinated by cues from a MIDI score on a DAW. However, the current commercial haptic metronomes cannot track live limb movement so cannot, for example, deliver realtime multi-limb polyphonic drumming instruction from a drum teacher, as in the case of the Haptic Bracelets (Sect. 11.3.2). For many purposes, metronomic cueing is sufficient, but live intrapersonal entrainment affords additional expressive, musical and educational possibilities.

A second design dimension involves the contrast between *discrete versus analog* haptic mapping. By *analog* mapping, we refer to simple mapping of an audio signal—typically amplified and filtered—to a vibrotactile transducer, as opposed to representing rhythmic events by discrete pulses. In the case of [41] and much of the work aimed at whole body experience or Deaf culture, the haptic signals are typically more or less direct mappings of audio signals. By contrast, the Haptic Bracelets and commercial haptic metronomes use discrete haptic signals to represent events in rhythmic patterns. Discrete haptic signals need not be uniform—they can have different intensities, lengths and envelopes, for example to represent accents or textures when driven by a MIDI score. Analog haptics can communicate greater subtlety of texture, and continuous (as opposed to discrete) signals play important roles in deliberately designed haptic perceptual illusions [36]. However, for some purposes discrete pulses can give useful simplicity to the representation of discrete musical events.

Choices in the system used for sensing rhythmic events can have interesting design implications when representing *polyphonic* rhythms, especially when taking cues from a live drummer or teacher. MuSS-bits [41] offers an instructive contrast in this respect with the Haptic Bracelets. MuSS-bits uses analog wireless sensor–display pairs that map microphone signals directly to vibrotactiles. Such a system can readily be used to route different haptic signals onto different limbs, but a simple microphone is less well suited to detecting which *limb* is striking a drum and when, and better suited to detecting which *drum* has been struck. This can have advantages in situations where the same limb plays more than one drum, but can have disadvantages where, for example, two limbs alternate in their playing of a single drum (Fig. 11.5).

Yet another design dimension involves the *choice of body location(s)* when applying haptics. Different locations have different advantages for different applications. For example, as noted earlier, the tension-based system by Yamakazi et al. [40] allows clear communication through the chest of highly detailed bass vibrations, whereas Lewiston [43], Huang et al. [44, 45] and Siem et al. [46, 47] focus on individual fingers, and the Haptic Bracelets focus primarily on the limbs. MuSS-bits by contrast emphasises flexibility in choice of body locations for its wireless sensor–display pairs. Choice of body location for haptics can have a variety of subtle effects on the perception of haptic signals beyond the scope of this chapter—a general discussion of this issue can be found in [49].

Finally, there is an important difference between the work by Grindlay [42], Tamaki et al. [48] and our own, related to the dimension of *control*. Although very different, their systems are both able to physically control human movements, while in our work (and most other related work) the haptics only communicate signals to guide the user's movement, and the user remains in control of all physical actions.

## **11.6 Conclusions**

Music is an evolutionarily ancient human activity [50], and rhythm plays a fundamental role in it. Understanding and playing several rhythms simultaneously is one of the most challenging rhythm skills to learn. In this chapter, we have argued that of all the sensory modalities, touch has a special role to play in learning and teaching multi-limbed rhythms. This is because it allows different rhythmic components to be directly experienced simultaneously but separately in the relevant limbs. When experiencing rhythms haptically in this way, users find it relatively easy to mentally direct their attention to the sensations in any single limb or arbitrary combinations of limbs [11]. In many other musical applications of haptics, the user is simply called upon to be *reactive,* e.g. to respond to notifications, feedback or guidance, or to passively experience aesthetic effects. By contrast, the use of haptics in support of rhythm skills draws on sophisticated *predictive* skills, in particular the distinctively human capability of biological entrainment.

For the above reasons, we designed and built a series of systems, starting with Haptic Drum Kit and more recently the wireless Haptic Bracelets [3, 16]. We have used these systems to study new ways of learning rhythm skills. They all provide multiple streams of haptic signals to the body using vibrotactile devices around the wrists and ankles to guide the timed movement of these limbs in time with repeated rhythmic stimuli. The development of this work was inspired by research from various fields, including music education (e.g. Dalcroze Eurhythmics), musicology, music psychology and cognitive science, in particular the theories of biological entrainment, and neural resonance.

In this chapter, we have described several applications of the wireless Haptic Bracelets, including: (1) a portable Haptic Music Player, or "Haptic iPod", which provides four channels of vibrotactile pulses that track drum parts in time with the music; (2) live interactive drum teaching with Haptic Bracelets worn by both teacher and learner, enabling the learner to feel in the appropriate limbs what the teacher is playing; (3) musician coordination and synchronisation, using the Haptic Bracelets to communicate musical cues such as count-ins, multichannel click tracks or section announcements in situations where audio may not be appropriate, such as recording studios or live on stage—these may be driven by a metronome, DAW or physical actions of a musician; and (4) teaching multi-limb drum patterns by multi-limbed haptic cueing.

Focusing on the last type of application, we have carried out three empirical studies with different versions of the Haptic Drum Kit and Haptic Bracelets to evaluate their usability and usefulness for this purpose. There was evidence that:


Compared to related work on using haptics for music education, our approach seems to be unique in the focus on supporting the acquisition of rhythmic skills that involve multi-limb coordination by providing multichannel haptic signals to both wrists and ankles, although the Haptic Bracelet technology is flexible enough to support a range of other applications.

Several areas of further research are suggested by this work, with relevance to various disciplines, including music perception, cognition and production; music education; music and the deaf; human synchronisation; sports science; neuroscience; and digital health. More empirical studies are needed to better understand factors that may affect the learning of multi-limb rhythm skills, including:


More attention is needed to factors such as different levels of drumming experience; the selection of rhythms and types of guidance provided (audio, haptic, visual or combinations). Better techniques are needed for automated analyses of drumming performance, characterising timing and accuracy in coordination of the limbs. We need to better understand the interplay between cognitive (e.g. symbolic) and embodied (e.g. Haptic Bracelets) approaches to internalising multiple simultaneous rhythms. Other directions for future work include investigating music-teaching applications that make use of the increased level of interactivity between teachers and learners provided by systems such as the latest version of the Haptic Bracelets. These systems may have particular relevance for deaf musicians. Finally, more work is needed on applications of the Haptic Bracelets in therapeutic settings in the health domain, e.g. combining musical stimuli with haptic guidance to support rehabilitation of walking skills for survivors of stroke and other neurological conditions.

## **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Chapter 12 Touchscreens and Musical Interaction**

**M. Ercan Altinsoy and Sebastian Merchel**

**Abstract** Touch-sensitive interfaces are more and more used for music production. Virtual musical instruments, such as virtual pianos or drum sets, can be played on mobile devices like phones. Audio tracks can be mixed using a touchscreen in a DJ set-up. Samplers, sequencers or drum machines can be implemented on tablets for use in live performances. The main drawback of traditional touch-sensitive surfaces is the missing haptic feedback. This chapter discusses if adding specifically designed vibrations helps improve the user interaction with touchscreens. An audio mixing application for touchscreens is used to investigate if tactile information is useful for interaction with virtual musical instruments and percussive loops. Additionally, the interaction of auditory and tactile perception is evaluated. The effect of loudness on haptic feedback is discussed using the example of touch-based musical interaction.

## **12.1 Introduction**

The usage of touch-sensitive interfaces has rapidly increased over the last 10 years, partially due to many successful applications for smartphones and tablets. Another reason is the enhanced interaction capabilities of touchscreens in comparison with the mouse. For example, their multi-touch capability allows the device to recognise more than one point of contact. Gesture-based communication can be realized easily using touchscreens. Additional interface elements, such as buttons, knobs, sliders, can be individually arranged depending on the application. These aspects make devices

M. E. Altinsoy (B) · S. Merchel

Institut für Akustik und Sprachkommunikation, Technische Universität Dresden, Helmholtzstr. 18, 01069 Dresden, Germany e-mail: ercan.altinsoy@tu-dresden.de

S. Merchel e-mail: sebastian.merchel@tu-dresden.de

<sup>©</sup> The Author(s) 2018

S. Papetti and C. Saitis (eds.), *Musical Haptics*, Springer Series on Touch and Haptic Systems, https://doi.org/10.1007/978-3-319-58316-7\_12

**Fig. 12.1** Digital touch instrument apps: **a** piano, **b** drum and **c** liveloops from the GarageBand (http://www.apple.com/ios/garageband/, last accessed on 25 Nov 2017) DAW, **d** sound objects (https://itunes.apple.com/us/app/sound-objects/id656640735?mt=8, last accessed on 25 Nov 2017)

with touch-sensitive surfaces very interesting for music-based applications. Virtual musical instruments as well as audio mixing and music composition applications benefit strongly from this trend. There are various apps which try to simulate existing musical instruments or to create new music experiences (Fig. 12.1).

Wanderley and Battier [1] described the importance of gestures and their recognition for music performance. Choi categorized gestural primitives as trajectorybased primitives, force-based primitives and pattern-based primitives. Several of these primitives can be recognized using touch-sensitive interfaces [2].

Several table-based interfaces for musical applications have been developed recently: the Reactable (Rotor1), Akustich2, Bricktable, Surface Music, Sound Storm3 or ToCoPlay [3–6]. Most of these devices use a tangible interface where the player controls the system by means of real objects. Musical applications running on touchscreen devices such as smartphones and tablets followed this trend. However, not only gesture recognition but also haptic feedback plays an important role in the success of such kind of applications. The missing haptic feedback in touchscreenbased devices strongly limits the capabilities of the system. The design of musical applications calls for the addition of advanced haptic feedback [7, 8]. For audio mixing, music composition applications and musical performances, touchscreen systems with haptic feedback are very promising.

<sup>1</sup>http://reactable.com/rotor/ (last accessed on 17 Nov 2017)

<sup>2</sup>http://modin.yuri.at/tangibles/data/akustisch.mp4 (last accessed on 17 Nov 2017)

<sup>3</sup>http://subcycle.org/ (last accessed on Nov. 25, 2017)

Several technical solutions have been developed for haptic feedback integration in touchscreen devices. Various types of low-cost and compact actuators are currently used in consumer electronics, having different characteristics [9]. In recent years, electrostatic and ultrasonic technologies have been researched for use in haptic interfaces. On touchscreens using electrostatic technology, finger movements over the touch surface induce an electric force field due to electrostatic friction [10, 11]. Various systems exist based on ultrasonic technology such as mid-air (no direct contact with the surface) [12, 13] or touch interfaces [14–16]. The latter employ ultrasonic vibrations to create a squeeze film of air between the vibrating surface and the fingertip, thus modulating the surface's friction. Focused ultrasound is capable of inducing tactile, thermal and tickling sensations [17, 18]. Both electrostatic and ultrasonic technologies do not use any moving parts.

Over the last few years, the authors have conducted several investigations with touchscreen-based devices to understand and improve the capabilities of such kind of systems for musical applications [19–24]. In this chapter, various aspects of these investigations are summarized, extended and discussed. Particularly, musical interactions with touchscreens require to consider both auditory and haptic perception. In most cases, the haptic feedback is generated by means of the audio signal; therefore, the interaction of both is an important issue. This chapter aims to illustrate some fundamental aspects of haptic and audio feedback for touchscreen-based musical applications and introduce the benefits of audio–tactile interaction.

## **12.2 Perceptual Aspects of Auditory and Haptic Modalities for Musical Touchscreen Applications**

Playing a musical instrument is a complex task, and optimized multisensory stimuli may be useful, e.g. supporting spatial and temporal accuracy. Sound and vibration are physically coupled while playing a musical instrument or listening to music live or through loudspeakers. The knowledge of auditory and haptic psychophysics is necessary for the designer of multimodal interfaces to develop high-quality devices. In this section, perception of intensity, frequency and temporal aspects is discussed with respect to their importance to musical applications.

## *12.2.1 Intensity*

Dynamic ranges of the auditory and tactile perceptions differ greatly. Although the perceivable dynamic range for hearing is approximately 130 dB, tactile perception can only discriminate a dynamic range of 50 dB. The just-noticeable differences

**Fig. 12.2** Growth of perceived magnitude as a function of sensation level for acoustical and vibratory stimuli at 250 Hz [19, 21, 22]

(JNDs) in level for both modalities are about 1 dB. In music applications, such dynamic range differences should be taken especially into account, especially if haptic feedback is produced using audio signals: The perceived vibration magnitude might rise rapidly from imperceptible to strong if vibrations are generated from audio signal with wide dynamic range. Therefore, it might be advantageous to apply dynamic compression [21].

Intensity perception across the two modalities shows different behaviours. At 1 kHz, an increase of 10 dB in sound pressure level causes a sense of doubling in perceived loudness. At 250 Hz, an increase of 4–8 dB in vibration level causes a sense of doubling in perceived vibration intensity. In Fig. 12.2, the perceived intensity growth functions of auditory and tactile modalities are compared at same frequency (250 Hz): The rate of growth for the tactile modality is higher than for the auditory modality.

## *12.2.2 Frequency*

In most musical applications, the frequency spectra of auditory and vibrotactile cues are coupled to each other by physical laws. Such frequency coupling plays an important role in how humans integrate auditory and tactile information [19].

Sounds that are audible to the human ear fall in the frequency range of about 20–20,000 Hz, with highest sensitivity between 500 and 4000 Hz. Just-noticeable frequency differences (JNFDs) for the auditory system were reported by Zwicker and Fastl [25]. They investigated that, at frequencies below 500 Hz, humans are able to differentiate between two tone bursts with a frequency difference of only about 1 Hz, and this value increases with frequency. Above 500 Hz, the JNFD is approximately 0.002 times the frequency.

The frequency range of auditory perception is much wider than that of tactile perception: The skin is sensitive to frequencies between 1 and 1000 Hz, with highest sensitivity in the range of 200–300 Hz. JNFDs for sinusoidal vibrations and tactile pulses on the finger and volar forearm were measured by different researchers [25–27]. The values for the Weber fraction (difference threshold divided by stimulus intensity) range from 0.07 to 0.2. Frequency discrimination of the tactile channel is fairly good at low frequencies but deteriorates rapidly as frequency increases [25].

Overall, these results indicate that the skin is rather poor at discriminating frequency in comparison with the ear.

## *12.2.3 Temporal Acuity and Rhythm Perception*

Conversely, the auditory modality shows an extraordinary temporal resolution. As an example, two impulses will be perceived as separate sounds if there is only 1–2 ms gap between them. Although the temporal acuity of the cutaneous system is not as high as that of the auditory system, still individuals can distinguish 8–10 ms gap between two tactile sinusoidal bursts [28, 29]. Anyhow, in comparison with vision, both audition and vibrotaction have very high temporal resolution.

Apart from temporal acuity, the perception of rhythm is an important capability of both modalities. In all cultures, it is common that people tap or move their hand, foot or other body parts in synchrony with music [30]. The processing of such metric information is only possible through the auditory and tactile/somatosensory channels, but not by means of vision. A research study by Brochard and colleagues shows that humans can abstract the metric structure from tactile rhythmic sequences as efficiently as from equivalent auditory patterns [31]. This ability is independent from the musical expertise. Various scientists assume that early developing relationship between the auditory modality and movement-related sensory inputs is maintained in adulthood [32]. The results of Bresciani et al. [33] show that the visual modality alone plays a minor role in feeling the contact with objects, at least when tactile and auditory modalities are available.

## *12.2.4 Synchrony*

Temporal correlation is an important cue for the brain to integrate multiple sensory inputs generated by a single event, as well as to differentiate inputs related to separate events occurring at the same time. However, the synchronization of different modalities in multimedia applications is a major issue, due to technical constraints such as data transfer time, computer processing time and delays that occur during feedback generation processes. As the asynchrony between different modalities increases, the sense of presence and realism of multimedia applications decrease.

Several results are available on audio–tactile asynchrony perception [34, 35], indicating that, in order to preserve a unitary percept, the temporal discrepancy between the auditory and tactile modalities must be within 25 ms for various multimedia systems. However, for the purpose of the discussion in this chapter, it is necessary to consider the literature focusing on touchscreens. Kaaresoja has measured the tolerable multimodal latency in mobile touchscreen virtual button interaction, showing that tactile feedback latency should not exceed 25 ms and audio feedback latency should not exceed 100 ms [36]. Unfortunately, most of the current mobile phones or tablets cannot fulfil these latency figures. Such latency issues have a negative effect on the quality of musical interaction. Therefore, the progress of multimodal technology with respect to synchrony and latency will play an important role for the success of musical touchscreen applications.

## **12.3 Experiment 1: Identification of Audio-Driven Tactile Feedback on a Touchscreen**

Grooveboxes can be considered as a combination of a control surface, a sampler, a music sequencer and a drum computer. They are popularly used for the production of various kinds of loop-based music styles, such as electro, techno, hip hop, especially in live concerts. Touchscreen-based grooveboxes may enable the user to redefine the combination, organization and size of the knobs, sliders, buttons [20]. In groovebox applications, the possibility to identify and discriminate the available musical loops is crucial to the user. A series of four experiments (referred to as 1a–d) were set up, whereby tactile feedback was generated from audio signals based on four different approaches. Tactile signal parameters were systematically varied according to the perceptual characteristics discussed in Sect. 12.2. The objective was to test which tactile feedback processing strategies helped distinguish audio loops. Furthermore, the attractiveness of the system, including pragmatic and hedonic qualities, was evaluated.

## *12.3.1 Stimuli*

The main discriminant acoustic features of musical instruments are the frequency and amplitude structure, and temporal envelope of the produced tones. Most percussive instruments are unpitched (e.g. the snare), while others excite auditory pitch perception (e.g. the kettledrum). Features such as melody, rhythm and dynamics must be processed to some extent to generate a suitable vibrotactile signal from the acoustical signal. To this end, various strategies have been applied in the experiments reported in this chapter, similar to what is described in Sect. 7.3.

The simplest way to generate tactile feedback from acoustic signals is by lowpass filtering, as done in experiments 1a and 1d with cut-off frequency set to 1 kHz. As discussed already, auditory and tactile signals have strong similarities in the frequency domain. However, the tactile system is not sensitive to frequencies above 1 kHz.

Experiment 1b investigated the use of a frequency-shift strategy to generate vibrotactile feedback from the original audio signal. Assuming that good integration between auditory and tactile information occurs when the acoustical frequency is a harmonic of the vibration frequency, the spectrum of the audio signal was shifted down one octave by means of granular synthesis technique. While this allowed to preserve accurate timing, the processing resulted in some unwanted artefacts. However, such artefacts are produced especially at higher frequencies, mostly above the range of tactile perception (see Sect. 4.2).

In experiment 1c, beat information was extracted from audio loops looking for fast attack transients in the amplitude envelope. The detected beats triggered sinusoidal pulses at 100 Hz and lasting 80 ms, that is easily perceived.

## *12.3.2 Set-up*

An Apple iPod Touch4 was used as touch-sensitive input device, while tactile feedback was delivered by an electrodynamic exciter (Monacor BR-25) mounted at the back of the iPod (see Fig. 12.3). Its touchscreen surface was divided into six virtual buttons, each of which corresponded to a specific audio loop. When the participant pressed a button, tactile feedback for the respective channel was rendered in real time using Pure Data, while the audio signals were reproduced by closed-back reference headphones (Sennheiser HDA 200). The headphones offer effective sound isolation and therefore masked the background noise generated by the tactile system. The task was to associate each vibrating button to the corresponding audio signal.

## *12.3.3 Subjects*

Twenty subjects, sixteen male and four female, aged between 20 and 40 years, participated in the experiment. They had no knowledge of acoustics, and they voluntarily participated in this study. All subjects were right-handed and had self-reported normal hearing.

<sup>4</sup>https://en.wikipedia.org/wiki/IPod\_Touch (last accessed on 15 Nov 2017).

**Fig. 12.3** Touchscreen device was mounted on an electrodynamic shaker for vibration reproduction

## *12.3.4 Results and Discussion*

In this section, the results of the identification investigations for different signal processing strategies are summarized.

#### **12.3.4.1 Low-Pass Filtering**

In experiment 1a, the six vibrotactile stimuli were generated by low-pass filtering the audio loops at 1 kHz.

The percentage of correct responses for the stimuli are shown in Fig. 12.4a. Subjects could correctly identify most of the instruments. Errors are particularly low for percussion instruments which generate mainly higher frequencies, such as the snare, hi-hat or tambourine: The percentage of correct responses for snare, hi-hat and tambourine is higher than 80%. The participants reported that temporal envelope and frequency content were important cues.

#### **12.3.4.2 Pitch Shifting**

In experiment 1b, the vibration signals were generated by shifting down by one octave the spectra of the audio loops. The resulting signals were low-pass filtered at 1 kHz to get rid of high-frequency artefacts due to the processing.

The percentage of correct responses for the six stimuli are shown in Fig. 12.4b. Compared to simple low-pass filtering, octave shifting improved the identification

**Fig. 12.4** Results of the identification experiment for different percussive instruments (audio loops). The vibration signals were generated by processing the audio signal via **a** low-pass filtering with cut-off at 1 kHz and **b** pitch shifting one octave down

of the loops. Indeed, pitch shifting allowed to perceive important components of the original sounds through the tactile sense. For instance, the attack of the kick drum presents relevant content at frequencies above 1 kHz. The kick drum and shaker could be better identified than in the low-pass filtering condition, but there were slightly more errors between the hi-hat and snare, perhaps because the hi-hat was perceived more intense than before as its dominant high-frequency energy was shifted towards lower frequencies. However, it is unclear whether features of the sequence (e.g. rhythm) or features of the source (e.g. frequency content) or both influenced the results; therefore, experiments 1c and 1d focused on separating the sequence and source features.

#### **12.3.4.3 Beat Detection**

In experiment 1c, the individual loops were analysed and their beat was detected, which in turn triggered artificial vibration signals. Thus, source features such as frequency content were not conveyed from the vibration signal, while the original rhythmic sequence was preserved.

Results are shown in Fig. 12.5a. While rhythm is an important factor for loop identification, the overall detection rate decreased. This showed that other features of musical signals play an important role.

**Fig. 12.5** Identification results for different instruments. The vibration signals were generated using **a** sequence features (beat detection and signal substitution) and **b** source features (low-passed percussive hits)

#### **12.3.4.4 Single Hits**

In experiment 1d, rhythm (sequence) information was removed to test whether a percussion instrument could be identified with only source features; thus, only a single hit was reproduced. Accordingly, the bass line and tambourine loops were removed from the stimuli set, and other percussion sounds (guiro and handclap) with distinct source features were added. The vibration signals were generated by low-pass filtering single hits at 1 kHz.

As seen in Fig. 12.5b, the kick drum and snare were identified with 100% accuracy, possibly due to their characteristic frequency content, which resulted in clearly distinct tactile perceptual qualities. Of the remaining instruments, the guiro had the highest number of correct identifications, perhaps because of its typical time structure (rattle like) that distinguishes it from the instruments with different time structures (bang like). The high-frequency percussive sounds were not differentiated well. Subsequent experiments revealed that the detection rate did not improve with octave shifting the single hits, or by adding a preliminary training phase.

#### **12.3.4.5 Summary**

The best identification rates were obtained when the source and sequence features were preserved (low-pass filtered or octave-shifted signals). Identification relying on rhythm information (beat detection) was observed to be time consuming and varied largely between subjects: The average identification time was approximately 10 s per loop in experiment 1c, while only 6 s were needed in experiments 1a and 1b and 8 s in the case of 1d.

## *12.3.5 Usability and Attractiveness*

Before and after the experiments reported above, participants were asked to mix the six audio loops into a 90 s composition using the set-up described in Sect. 12.3.2. Instead of buttons, six faders were used to blend the different audio signals. In the first set, a conventional groovebox without tactile feedback was simulated. In the second set, audio-driven tactile feedback was rendered using the octave shift approach described in Sect. 12.3.4.2. When the finger of the user came in contact with a fader, vibration for the respective channel was rendered.

After completion, participants were asked to judge the usability and attractiveness of the groovebox using the AttrakDiff [37] semantic differential. This method uses pairs of bipolar adjectives to evaluate the pragmatic and hedonic qualities of interactive products. The adjectives, grouped under four categories, and relative acrossparticipants mean semantic ratings are reported in Fig. 12.6. The *pragmatic quality* is on average better without tactile feedback; this was likely due to participants experiencing some difficulty with audio–tactile association in the prior experiments. The individual ratings for the tactile feedback set-up varied, indicating disagreement between subjects. However, the difference in *pragmatic quality* is not statistically significant (dependent t test for paired samples, p > 0.05). On average, the *hedonic quality* was better with tactile feedback, especially for the "stimulation" aspect (p < 0.05). The hedonic category "stimulation" refers to the ability of a product to support the user to further personal development. The groovebox with audio-driven tactile feedback was rated as more innovative, captivating and challenging. These results are in agreement with other studies that evaluated multimodal feedback [38]. The overall attractiveness of the groovebox remains the same with or without audio-driven tactile feedback. This result is reasonable if the attractiveness is understood based on the hedonic and pragmatic qualities, where each contributes in equal parts to the attractiveness of a product [35].

Obviously, the presented data are only valid for the specific exercise and the laboratory conditions described above, while results might change depending on task and context. For example, in a real live set it might be more important to know if a finger is on the correct fader; tactile feedback might also help DJs match beats between different tracks, influencing their pragmatic quality perception. Thus, conclusions should be drawn carefully.

In most touchscreen-based consumer devices, such as mobile phones and tablets, smaller low-fidelity actuators are used instead of the electrodynamic exciter that was used in the described experiments. Small actuators have several limitations in terms of the achievable vibration intensity and frequency range. Additionally, they have a slow temporal response time in comparison with other technologies, such as voice

**Fig. 12.6** Mean values of the AttrakDiff semantic differential for seven items on each of the four dimensions: *pragmatic quality*, *hedonic quality*—*identity*, *hedonic quality*—*stimulation* and *attractiveness*

coil or piezoelectric actuators (see Sect. 13.2 for a review of actuator technology). To overcome such limitations, multimodal interaction can be very promising as it can compensate what is lacking in one modality with higher fidelity in another channel. In this perspective, a further experiment was conducted to investigate crossmodal intensity interaction between the auditory and tactile channels.

## **12.4 Experiment 2: Effect of Loudness on Perceived Tactile Intensity of Virtual Buttons**

For several conventional or digital musical instruments, one fundamental interaction is that of pressing a button or a key [39]. Also, interaction with the user interface of DMIs (e.g. a groovebox) or mixing consoles is often mediated by buttons. This experiment aims to investigate the effect of loudness on the perceived intensity of tactile feedback provided by a touchscreen.

## *12.4.1 Stimuli*

An impulsive waveform was selected as tactile signal, which represents the feedback produced by a conventional button. The stimuli amplitude corresponds to the perpendicular displacement of the surface, and positive values mean movement towards the subject. In order to be compatible with the characteristics of small actuators, a relatively small amplitude was selected. The maximum amplitude of the stimuli, which occurs at the beginning of the interaction, is 20 µm. The amplitude of the impulse then decays exponentially in 100 ms. As audio signal, a 400 Hz decaying sinusoid lasting also 100 ms was selected. The initial and maximum sound pressure level could be set at 50, 60 or 70 dB. Again, an exponential decay was applied.

## *12.4.2 Set-up*

The experiment made use of the same hardware set-up as in experiment 1 (see Sect. 12.3.2). In this case, the surface of the touchscreen was divided into two virtual buttons.

## *12.4.3 Subjects*

Eighteen subjects, twelve male and six female, aged between 20 and 35 years, participated in this experiment. The subjects had no any acoustic knowledge, and they voluntarily participated in this study. All subjects were right-handed and had selfreported normal hearing.

## *12.4.4 Procedure*

The task was to estimate the intensity of the feedback delivered by the virtual button. Participants were instructed to concentrate only on the tactile feedback. The magnitude estimation method with anchor stimulus was used [40]. After the tactile-only anchor stimulus, a test stimulus was presented and participants had to assign a number proportional to their subjective impression of the stimulus intensity relative to the anchor stimulus, assuming that the intensity of the latter corresponded to 100.

When participants did not perceive the test stimulus, they had to assign 0. Each stimulus pairs were presented ten times in random order.

## *12.4.5 Results and Discussion*

Figure 12.7 shows the responses of all subjects. Geometric mean values were computed for the magnitude estimates obtained from all subjects for each stimulus condition.

All audio–tactile conditions produced higher estimates than the *only*-*tactile* condition. Dependent t tests of the means showed that three conditions (*only tactile*, *audio*–*tactile 50 dB* and *audio*–*tactile 70 dB*) differed significantly (p < 0.05).

The results show that if a tactile button feedback is combined with audio feedback, the perceived intensity of the tactile feedback increases. When the tactile stimulus was accompanied by the acoustic stimulus, the tactile intensity was perceived on average between 56 and 96% higher.

The perceived tactile intensity magnitude increased for increasing sound levels, in spite of no change in the actual tactile feedback level. Similarly, in a previous investigation the authors found that, for a virtual drum, the magnitude of force feedback strength increased with increasing loudness, in spite of no change in force feedback [19].

Overall, these results indicate that auditory information can be useful in overcoming the current limitations of haptic devices.

## **12.5 Conclusions**

In this chapter, first the fundamental perceptual aspects of auditory and tactile perception were discussed focusing on musical touchscreen applications. Based on this knowledge, various audio–tactile signal generation techniques were introduced and evaluated.

feedback intensity for

In a first series of experiments, it was found that percussive instruments can be identified to some degree if audio-driven tactile feedback is rendered. The detection rate was best when source characteristics and rhythmic features were maintained while translating from audio to tactile signals. A qualitative study showed that tactile feedback can improve the quality of touchscreen-based music interfaces and make them more attractive for the users.

A second investigation based on the same set-up focused on the perceived tactile feedback intensity of virtual buttons, showing that this can be significantly influenced by parallel auditory. This result may be used to compensate for the limitations of current small actuator technology as found in consumer devices. The coupled perception of sound and vibration is important for the implementation of innovative touch-based musical interaction, and tactile feedback is useful to enrich the musical interaction.

## **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Chapter 13 Implementation and Characterization of Vibrotactile Interfaces**

**Stefano Papetti, Martin Fröhlich, Federico Fontana, Sébastien Schiesser and Federico Avanzini**

**Abstract** While a standard approach is more or less established for rendering basic vibratory cues in consumer electronics, the implementation of advanced vibrotactile feedback still requires designers and engineers to solve a number of technical issues. Several off-the-shelf vibration actuators are currently available, having different characteristics and limitations that should be considered in the design process. We suggest an iterative approach to design in which vibrotactile interfaces are validated by testing their accuracy in rendering vibratory cues and in measuring input gestures. Several examples of prototype interfaces yielding audio-haptic feedback are described, ranging from open-ended devices to musical interfaces, addressing their design and the characterization of their vibratory output.

## **13.1 Introduction**

The use of cutaneous feedback, in place of a full-featured haptic experience, has recently received increased attention in the haptics community [5, 31], both at research level and industrial level. Indeed, enabling vibration in consumer

M. Fröhlich e-mail: martin.froehlich@zhdk.ch

S. Schiesser e-mail: sebastien.schiesser@zhdk.ch

F. Fontana Dipartimento di Scienze Matematiche, Informatiche e Fisiche, Università di Udine, via delle Scienze 206, 33100 Udine, Italy e-mail: federico.fontana@uniud.it

F. Avanzini Dipartimento di Informatica, Università di Milano, Via Comelico 39, 20135 Milano, Italy e-mail: federico.avanzini@di.unimi.it

S. Papetti (B) · M. Fröhlich · S. Schiesser

ICST—Institute for Computer Music and Sound Technology,

Zürcher Hochschule der Künste, Pfingsweidstrasse 96, 8005 Zurich, Switzerland e-mail: stefano.papetti@zhdk.ch

devices—especially portable ones—is far more practical than providing motion and force feedback to the user, which would generally result in bulky and mechanically complex implementations requiring powerful motors. Recently, several studies have been conducted on the use of vibratory cues as a sensory substitution method to convey pseudo-haptic effects, e.g., to simulate textures [2, 26], moving objects [43], forces [14, 25, 29, 35], or alter the perceived nature and compliance of materials [30, 32, 41]. Other studies exist that assessed intuitiveness of vibrotactile feedback with untrained subjects [21] and how it may improve user performance after training [38].

Among the approaches adopted to design vibrotactile feedback for non-visual information display, complex semantics have been investigated [20] on top of simpler vibrotactile codes [3, 22]. Focusing in particular on DMIs, the most straightforward solution is to obtain tactile signals directly from their audio output. In practice, this may be done either by rendering to the skin the vibratory by-products generated by embedded loudspeakers—for instance, this may occur as a side effect while playing some inexpensive digital pianos for home practicing—or, using a slightly more sophisticated technique, by feeding dedicated vibrotactile actuators with the same signals used for auditory feedback [12]. In spite of the minimal design effort, these approaches have the potential to result in a credible multimodal experience. Sound and vibration are in fact tightly coupled phenomena, as sound is the acoustic manifestation of a vibratory process. However, these simple solutions overlook a number of spurious and unwanted issues such as odd coupling between the electroacoustic equipment and the rest of the instrument, and unpredictable nonlinearities in the vibrotactile response of the setup [10]. A more careful design should be adopted instead, in which vibrotactile signals are tailored to match human vibrotactile sensitivity (see Sect. 4.2) and adapted to the chosen actuator technology. In musical interfaces, this can be generally done by equalizing the original audio signal with respect to both its overall energy and frequency content, as discussed in more detail in Sect. 13.3 of this chapter.

To make sure that newly developed musical haptic devices actually render feedback as designed, we suggest that they should undergo characterization and validation procedures. The literature of touch psychophysics shows that divergent results are possible, due to the varying accuracy of haptic devices [23, 36]. As an example, when studying vibrotactile sensitivity the characterization of vibratory output would allow experimenters to compare the stimuli actually delivered to the skin with the original stimuli fed in the experimental device. Notably, a similar practice is routinely implemented in psychoacoustic studies where, e.g., the actual sound intensity reaching the participants' ears is usually measured and reported together with other experimental data. Particular attention should also be devoted to analyzing the mechanical coupling between a vibrotactile interface and the skin, as that is ultimately how vibratory stimuli are conveyed [27]. However, as discussed in Sect. 4.1, this may turn out especially difficult when targeting everyday interaction involving active touch, as opposed to controlled passive settings that are only possible in a laboratory. Once characteristics have been measured, they may guide the iterative design and refinement of haptic interfaces and may offer experimenters a more insightful interpretation of experimental results.

In what follows, we first discuss readily available technology that is suitable for implementing vibrotactile feedback in musical interfaces and then describe the design and characterization of a few exemplary devices that were recently developed by the authors for various purposes.

## **13.2 Vibrotactile Actuators' Technology**

When selecting vibrotactile actuators, designers and engineers need to consider factors such as cost, size, shape, power and driving requirements, frequency, temporal, and amplitude response [5]. For rendering effective tactile feedback, such responses should at least be compatible with results of touch psychophysics. Also, to grant versatility in the design of vibrotactile cues, actuators' frequency response and dynamic range should be as wide as possible, and their onset/stop time negligible. For example, while it is known that piano mechanics results in variable delay between action and audio-tactile feedback [1], to have full control over this aspect while designing keyboard-based DMIs, audio and tactile devices should offer the lowest possible latency [7, 17].

Among the currently available types of actuators suitable to convey vibrotactile stimuli, the more common ones are as follows: eccentric rotating mass (ERM) actuator, voice coil actuator (VCA), and piezoelectric actuator [5, 24].

ERM actuators make use of a direct current (DC) motor, which spins an eccentric rotating mass. They come in various designs with different form factors, ranging from cylinders to flat 'pancakes.' This technology has two main downsides: The first one is that vibration frequency and amplitude are interdependent, as the rotational speed (frequency), which is proportional to the applied voltage, is also proportional to the generated vibration amplitude; the second one is that, mainly due to its inertia, the rotating mass requires some time to reach a target speed. Overall, these issues make ERM unsuitable to reproduce audio-like signals that have rich frequency content and fast transients. Despite these limitations, thanks to their simple implementation ERM actuators have been commonly used in consumer electronics such as mobile phones and game devices.

VCAs are driven by alternate current (AC) and consist of an electrically conductive coil (usually made of copper) interacting with a permanent magnet. Two main VCA types are available, either using a moving coil or using a moving, suspended magnet. The functioning principle of moving coil VCAs is similar to that of the loudspeaker, except that, instead of a membrane producing sound pressure waves, there is a moving mass generating vibrations. Moving coil VCAs are generally designed to move small masses, and since their output energy in the lower frequency range is constrained by the size of the moving mass, they cannot produce substantial low-frequency vibration. Conversely, moving magnet VCAs are of greater interest for vibrotactile applications as they can generally provide higher energy in the lower frequency band. However, to keep them compact and light, a smaller moving mass must be compensated by a larger peak-to-peak excursion, complicating the suspension design [44]. Linear resonating actuators (LRAs) are particular voice coil designs that use a moving magnetic mass attached to a spring. They are meant to produce fixed frequency vibration at the resonating frequency of the spring–mass system, and therefore, they are highly power-efficient. Because of their increased power efficiency and compactness compared to ERM actuators, LRAs are becoming the preferred choice for use in consumer electronics, at the cost of higher complexity of the driving circuit. Generally though, VCAs offer wide band frequency operation and quick response times, making them suitable for audio-like input signals, with complex frequency content and fast transients.

Piezoelectric materials deform proportionally to an applied electric field, or conversely develop an electric charge proportional to the applied mechanical stress. For this reason, they can be used both as sensors and actuators. In the latter case, they may be driven either by DC or by AC current. Since piezoelectric actuators have no moving parts and no friction is produced, they present minimal aging effects and are generally regarded as highly robust. Variations of size, form, and cost/quality factors are available, ranging from ultra-cheap thin piezo disks to high-performance devices made of stacked piezoelectric elements (e.g., used for precision positioning). Piezo actuators have extremely fast response times, and their frequency range can be very wide (although not particularly in the lower band), so they may be used, e.g., as extremely compact loudspeakers or to generate ultrasounds. Since they do not generate magnetic fields while operating, they are suitable when space is tight and insulation from other electronic components is not possible. On the downside, while their current consumption is low (similar to LRAs), compared to VCAs and ERM they require higher voltage input to operate, up to a few hundreds Volt. Therefore, they usually need special driving electronics to be used with audio signals.

Several solutions are available for controlling the above types of actuators, both in the form of hardware and software. Hardware solutions are typically driving circuits used to condition input signals to conform with target actuator specifications,<sup>1</sup> while software solutions include libraries of pre-recorded optimized input signals to achieve different effects in interactive applications.<sup>2</sup>

## **13.3 Interface Examples**

## *13.3.1 The Touch-Box*

The Touch-Box is an interface originally developed for conducting experiments on human performance and psychophysics under vibrotactile feedback conditions. The device, shown in Fig. 13.1, measures normal forces applied to its top panel, which provides vibrotactile feedback. An early prototype was used to study how auditory, tactile, and audio-tactile feedback affect the accuracy of finger pressing force [18]. A

<sup>1</sup>See, for instance, www.ti.com/haptics (last accessed on Nov 29, 2017).

<sup>2</sup>For example, see Immersion TouchSense technology: www.immersion.com (last accessed on Nov 29, 2017).

**Fig. 13.1** The Touch-Box interface. Figure reprinted from [33]

more recent psychophysical experiment—described in Sect. 4.2 and making use of a more advanced prototype, described below—investigated how vibrotactile sensitivity is influenced by actively applied finger pressing forces of various intensities.

#### **13.3.1.1 Implementation**

For the latter experiment, a high-fidelity version of the Touch-Box was developed. Load cell technology was selected for force sensing, thanks to superior reliability and reproducibility of results: A CZL635 load cell was chosen, capable of measuring forces up to 49 N. For vibrotactile feedback, a Tactile Labs Haptuator mark II3 was used: a VCA with moving magnet suitable to render vibration up to 1000 Hz. An Arduino UNO computing platform4 receives the analog force signal from the load cell and samples it uniformly at 1920 Hz with 10-bit resolution [6]. The board is connected via USB to ad hoc software developed in the Pure Data environment and run on a host computer. The software receives force data and uses them to synthesize vibrotactile signals in return. These are routed as audio signals through a RME Fireface 800 audio interface5 feeding an audio amplifier connected to the actuator. The device measures the area of contact of a finger touching its top surface. Similar to the technological solution described in [42], a strip of infrared LEDs was attached at one side of the top panel, which is made of transparent Plexiglas: In this way, a finger pad touching the surface is illuminated by the infrared light passing through it. A miniature infrared camera placed under the top panel captures high-resolution (1280 × 960 pixels) images at 30 fps and sends them via USB to a video processing

<sup>3</sup>http://tactilelabs.com/products/haptics/haptuator-mark-ii-v2/ (last accessed on Dec. 21, 2017).

<sup>4</sup>https://store.arduino.cc/usa/arduino-uno-rev3 (last accessed on Dec. 21, 2017).

<sup>5</sup>http://www.rme-audio.de/en/products/fireface\_800.php (last accessed on Dec. 21, 2017).

software developed in the Max/MSP/Jitter environment, where finger contact area is estimated.

The mechanical construction of the interface was iteratively refined, so as to optimize the response of the force sensor and vibrotactile actuator. For instance, since the moving magnet of the Haptuator moves along its longitudinal direction, the actuator was suspended and mounted perpendicularly at the lower side of the Touch-Box top panel, thus maximizing the amount of energy conveyed to it. Special care was devoted to forbid coupling of the Haptuator with the rest of the structure, which could generate spurious resonances and dissipate energy. Various weight and thickness values of the Plexiglas panel were also tested, with the purpose of minimizing nonlinearities in the produced vibration, while keeping the equivalent mass of a finger pressing on top of the panel compatible with the vibratory power generated by our system.

#### **13.3.1.2 Characterization of Force Measurement**

The offset load on the force sensor due to the device construction was first measured and subtracted for subsequent processing. Force acquisition was characterized by performing measurements with a set of test weights from 50 to 5000 g resulting in a pseudo-linear curve which maps digital data readings from the Arduino board (10-bit values) to the corresponding force values in Newtons. The obtained map was used in the Pure Data software to read force data.

#### **13.3.1.3 Characterization of Contact Area Measurement**

Finger contact area is obtained from the data recorded by the infrared camera. Acquired images are processed in real time to extract the contour of the finger pad portion in contact with the panel and to count the number of contained pixels.

The area corresponding to a single pixel (i.e., the resolution of the area measurement system) was calibrated by applying a set of laser-cut adhesive patches of predefined sizes on the top panel. Test weights of 200, 800, and 1500 g were used to simulate the pressing forces used in the experiment described in Sect. 4.2, which result in slightly different distances of the top panel from the camera, influencing its magnification ratio. The measurements were averaged for each pressing force level, obtaining the following pixel size values: 0*.*001161 mm<sup>2</sup> (200 g), 0*.*001125 mm<sup>2</sup> (800 g), and 0*.*001098 mm<sup>2</sup> (1500 g).

Finger contact areas in mm<sup>2</sup> were finally obtained by multiplying the counted number of pixels by the appropriate pixel size value, depending on the applied force.

#### **13.3.1.4 Characterization of Vibration Output**

The accuracy of the device in reproducing a given vibrotactile signal was tested. The test signals were those used in the mentioned experiment: a sine wave at 250 Hz, and a white noise band-pass filtered with 48 dB/octave cutoffs at 50 and 500 Hz. Vibration measurements were carried out with a Wilcoxon 736 T piezoelectric accelerometer<sup>6</sup> (sensitivity = 10*.*2 mV*/*m*/*s2, ±5%, 25 ◦C) with frequency response flat ±5% in the 5–32200 Hz range) connected to aWilcoxon iT111M transmitter.7 The accelerometer was secured to the top of the Touch-Box with double adhesive tape. The AC-coupled output of the transmitter was recorded via a RME Fireface 800 interface as audio signals at 48 kHz with 24-bit resolution.

Vibrations produced by the Touch-Box were recorded at different amplitudes in 2 dB steps, in the range used in the reference experiment. Measurements were repeated by placing 200, 800 and 1500 g test weights on top of the device, accounting for the pressing forces used in the experiment.

The following calculations were performed on the recorded vibration signals to extract acceleration values: (i) Digital values in the range [−1*,* 1] were translated to a dBFS representation; (ii) voltage values in Volt were obtained from dBFS values, based on the nominal input sensitivity of the audio interface (+19 dBu @ 0 dBFS, reference 0*.*775 V); (iii) acceleration values in m*/*s <sup>2</sup> were calculated from Volt values, based on the nominal sensitivity of the accelerometer. Finally, RMS acceleration values in dB (re 10−<sup>6</sup> m*/*s2) were computed over an observation interval of 8 seconds to minimize the contribution of unwanted external noise. Notice that the considered vibration signals are periodic or stationary.

#### Amplitude Response

The curves in Fig. 13.2a, b relate the relative amplitudes of the stimuli to the corresponding actual vibration energy produced by the Touch-Box, expressed as RMS acceleration. Vibration acceleration was measured in the range from the initial amplitude used in the reference experiment down to −6 dB below the minimum average vibrotactile threshold found. Generally, vibration amplitude varied consistently with that of the input signal, resulting in a pseudo-linear relationship. However, the three weights resulted in different amplitude offsets, due to mechanical dampening. In the analysis of experimental data, this characterization was used for mapping the experimental results to actual RMS vibration acceleration values, in this way compensating for the dampening effect of pressing forces on vibration amplitude. As shown in Table 13.1a, the effective step size of amplitude variation for the three weights is consistent across the considered range.

<sup>6</sup>https://buy.wilcoxon.com/736t.html (last accessed on Dec. 21, 2017).

<sup>7</sup>https://buy.wilcoxon.com/it100-200m.html (last accessed on Dec. 21, 2017).

**Fig. 13.2** Amplitude variation of different stimuli. Figure reprinted from [33] (Appendix)

**Table 13.1** Mean and standard deviation (in brackets) of (a) RMS acceleration amplitude variation (original step size 2 dB), and (b) offsets relative to amplitudes measured for the 200 g weight. Table reprinted from [33] (Appendix)


Table 13.1b shows amplitude offsets for the 800 and 1500 g weights, relative to the measured amplitudes for the 200 g weight. Overall, the performed characterization shows that the device behaves consistently with regard to amplitude and energy response, with slightly higher accuracy when sinusoidal vibration is used.

#### Frequency Response

Fig. 13.3 shows the measured magnitude spectra of noise stimuli, for three sample amplitudes ranging from the initial level used in the experiment down to −6 dB below the minimum average threshold found. In addition to the dampening effect on RMS vibration amplitudes noted above—which is the only effect measured in the sinusoidal condition—in the case of the noise stimulus, the three weights resulted in spectral structures slightly different from the original flat spectrum in the 50–500 Hz range used as input signal. For a given weight, the *spectral centroid* (i.e., the amplitude-weighted average frequency, which roughly represents the 'center of mass' of a spectrum) of noise vibration was found to generally decrease with the signal amplitude: For the 200 g weight, the spectral centroid varied from 188 Hz at the initial amplitude to 173 Hz at −6 dB below the minimum average threshold found. For the 800 and 1500 g weights, the spectral centroid varied, respectively, from 381*.*3 to 303 Hz and from 374*.*5 to 359*.*4 Hz.

The characterization of vibrotactile feedback highlighted strengths and weaknesses of the Touch-Box implementation, allowing to validate experimental results and to compensate for hardware limitations (namely, amplitude dampening and nonflat spectral response). For instance, as mentioned in Sect. 4.2.4, finding that the peak energy of the stimuli in the higher force condition shifted above the region of maximum sensitivity (200–300 Hz, [39]) suggests that the vibrotactile threshold measured in that case was likely higher than in reality.

## *13.3.2 The VibroPiano*

Historically, the reproduction of haptic properties of the piano keyboard has been first approached from a kinematic perspective with the aim of recreating the mechanical response of the keys [4, 28], also in light of experiments emphasizing the sensitivity of pianists to the keyboard mechanics [13]. Only recently, and in parallel to industrial outcomes [16], researchers started to analyze the role of the vibrotactile feedback component as a potential conveyor of salient cues. An early attempt by some of the present authors claimed possible qualitative relevance of these cues while playing a digital piano [12]. A few years later, a refined digital piano prototype was implemented, capable of reproducing various types of vibrotactile feedback at the keyboard. This new prototype was used to test whether the nature of feedback can affect pianists' performance and their perception of quality features (see Sect. 5.3.2.2).

**Fig. 13.3** Acceleration magnitude spectrum (FFT size 32768) of the noise stimuli for the three test weights (dB, re 10−<sup>6</sup> m*/*s2). Colors represent different amplitudes: start amplitude (black), −18 dB, i.e., about the minimum vibrotactile threshold found in the experiment (magenta), and −24 dB (cyan). Horizontal lines show RMS acceleration amplitudes. Figure reprinted from [33] (Appendix)

#### **13.3.2.1 Implementation**

A digital piano was used as a platform for the development of a keyboard prototype yielding vibrotactile feedback. After some preliminary testing with different tactile actuators attached to the bottom of the original keyboard, the instrument was disassembled, and the keyboard detached from its metal casing and screwed to a thick plywood board (see Fig. 13.4). This customization improved the reproduction of vibrations at the keys: on the one hand by avoiding hardly controllable nonlinearities arising from the metal casing, and on the other hand by conveying higher vibratory energy to the keys thanks to the stiffer wooden board. Two Clark Synthesis TST239 tactile transducers<sup>8</sup> were attached to the bottom of the wooden board, placed, respectively, in correspondence of the lower and middle octaves, in this way

<sup>8</sup>http://clarksynthesis.com/ (last accessed on Dec. 21, 2017).

**Fig. 13.4** The VibroPiano setup. Figure adapted from [10]

conveying vibrations at the most relevant areas of the keyboard [11]. Once equipped in this way, the keyboard was laid on a stand, interposing foam rubber at the contact points to minimize the formation of additional resonances.

The transducers were driven by a high-power stereo audio amplifier set to dual mono configuration and fed with a monophonic signal sent by a host computer via a RME Fireface 800 audio interface. The audio interface received MIDI data from the keyboard and passed it to the computer, where sound and vibrotactile feedback were, respectively, generated by Modartt Pianoteq,<sup>9</sup> a physical modeling piano whose audio feedback was delivered to the performer via earphones, and a software sampler playing back vibration samples, which were prepared beforehand as described below. A diagram of the setup is shown in Fig. 13.5.

#### **13.3.2.2 Preparation of Vibration Samples**

Recording of Piano Keyboard Vibrations

Vibrations were recorded at the keyboard of two Yamaha Disklavier pianos—a grand model DC3-M4, and an upright model DU1A with control unit DKC-850—via the same measurement setup described in Sect. 13.3.1.4. The accelerometer was secured to each measured key with double-sided tape to ensure stable coupling and easy removal. As explained in Sect. 4.3.1, Disklavier pianos can be controlled remotely by sending them MIDI control data. That allowed to automate the recording of vibration samples by playing back MIDI 'note ON' messages at various MIDI velocities for each of the 88 actuated keys of the Disklaviers.

<sup>9</sup>https://www.pianoteq.com/ (last accessed on Dec. 21, 2017).

**Fig. 13.5** Schematic of the VibroPiano setup. Figure reprinted from [10]

The choice of suitable MIDI velocities required to analyze the Disklaviers' dynamic range. The MIDI volume of the two Disklavier pianos was first set to approximate a linear response to MIDI velocity, according to Yamaha's recommendations. The acoustic dynamic response to MIDI velocity was then measured by means of a KEMAR mannequin10 (grand Disklavier) or a sound level meter (upright Disklavier) placed above the stool, approximately at the height of a pianist's ears [11]. The loudness of a A4 tone was measured for ten, evenly spaced, values of MIDI velocity in the range 2–127. Each measurement was repeated several times and averaged. Results are reported in Table 13.2. In accordance with a previous study [15] that measured temporal and dynamic accuracy of computer-controlled grand pianos in reproducing MIDI control data, our results show a flattened dynamic response for high velocity values. Also, the upright model shows a narrower dynamic range, especially for low velocity values.

<sup>10</sup>http://kemar.us/ (last accessed on Dec. 21, 2017).


**Table 13.2** Sound level of a A4 tone, generated by the two Disklavier pianos for various MIDI velocities

Based on the above results, MIDI velocities 12, 23, 34, 45, 56, 67, 78, 89, 100, 111 were selected for acquiring vibration recordings. This substantially covered the entire dynamic range of the pianos with evenly spaced velocity values. Extreme velocity values were excluded, as they result in flattened dynamics or unreliable response. For each of the selected velocity values, acceleration samples were recorded at the 88 keys of the two pianos. Recordings for each key/velocity combination lasted 16 seconds, thus amply describing the decay of vibration amplitude. Since the accelerometer was mounted on top of the measured keys, the initial part of the recorded samples represents the displacement of the keys being depressed by the actuation mechanism, until they hit the keybed and stop (see Fig. 4.4). Not being interested in kinesthetic components for the purpose of our research, these transients were manually removed from each of the samples, thus leaving only the purely vibratory part.

#### Synthetic Vibration Samples

A further set of vibration samples was instead synthesized, aiming at reproducing the same amplitude envelope of the real vibration signals while changing only their spectral content. Synthetic signals for each key and each of the selected velocity values were generated as follows. First, a white noise was bandlimited in the range 20–500 Hz, covering the vibrotactile bandwidth [40] while being compatible with audio equipment.11 The bandlimited noise was then passed through a second-order resonant filter centered at the fundamental frequency of the note corresponding to the key. The resulting signal was modulated by the amplitude envelope of the matching vibration sample recorded on the grand piano, which in turn was estimated from the energy decay curve of the sample via the Schroeder integral [37]. Finally, the

<sup>11</sup>In the low range, audio amplifiers are usually meant to treat signals down to 20 Hz.

power (RMS level) of the synthetic sample was equalized to that of the corresponding recorded sample.

Vibration Sample Libraries

The recorded and synthetic vibration samples sets were stored into the software sampler, which offers sample interpolation across MIDI velocities. Overall, three sample libraries were created: two from recordings on the grand and upright Disklavier pianos, and one from the generated synthetic samples.

#### **13.3.2.3 Characterization and Calibration**

As suggested in the Chapter, to make sure that the piano prototype could accurately reproduce the designed audio and tactile feedback, it was subjected to a calibration procedure dealing with the following aspects: (i) auditory loudness; (ii) keyboard velocity response; (iii) amplitude and frequency response of vibrotactile feedback.

#### Loudness Matching

As a first step, the loudness of the piano synthesizer at the performer's ear was matched to that of the Disklavier pianos. The piano synthesizer was set to simulate either a grand or an upright piano, to match the character of the reference Disklaviers. Measurements were taken with the KEMAR mannequin wearing earphones by having Pianoteq playback A notes on all octaves at the previously selected velocities. By using the volume mapping feature of Pianoteq—which allows one to set independently the volume of each key across the keyboard—the loudness of the piano synthesizer was then matched to the measurements taken on the Disklavier pianos as described in Sect. 13.3.2.2.

#### Keyboard Velocity Calibration

As expected, the keyboards of the Disklaviers and that of the Galileo digital piano have markedly different response dynamics due to their different mechanics and mass. Once the loudness of the piano synthesizer was set, the velocity response of the digital piano keyboard was matched to that of the Disklavier pianos.

The keyboard response was adjusted via the velocity calibration routine included with Pianoteq, which was performed by an experienced pianist first on the Disklavier pianos—this time used as silent MIDI controllers driving Pianoteq—and then on the digital keyboard. Fairly different velocity maps were obtained. By making use of a MIDI data filter, each point of the digital keyboard velocity map was projected onto the corresponding point of the Disklavier velocity map. Two maps were therefore created, one for each synthesizer-Disklavier pair (grand and upright models). The resulting key velocity transfer characteristics were then independently checked by two more pianists, to validate its reliability and neutrality. Such maps ensured that, when a pianist played the digital keyboard at a desired dynamics, the generated auditory and tactile feedback were consistent with that of the corresponding Disklavier piano.

#### Spectral Equalization

As a final refinement, the vibratory frequency response of the setup was analyzed and then equalized for spectral flattening. Despite the optimized construction, spurious resonances were still present in the keyboard-plywood system, and additionally, the transducers' frequency response exhibits a prominent notch around 300 Hz.

The overall frequency response of the transduction-transmission chain was measured in correspondence of all the A keys, leading to an average magnitude spectrum that, once inverted, provided the spectral flattening equalization characteristics shown in Fig. 13.6. The 300 Hz notch of the transducers got compensated along with resonances and anti-resonances of the mechanical system.

In order to prevent the generation of resonance peaks along the keyboard, the equalization curve was approximated using a software parametric equalizer in series with the software sampler that reproduced vibration signals. Focusing on the tactile bandwidth range, the approximation made use of a shelving filter providing a ramp climbing by 18 dB in the range 100–600 Hz, and a 2nd-order filter block approximating the peak around 180 Hz.

At the present stage, the VibroPiano has undergone informal evaluation by several pianists, who gave very positive feedback. Moreover, as described in Sect. 5.3.2.2, it has been used to test how different vibrotactile feedback (namely, realistic, realistic with increased intensity, synthetic, no feedback) may influence the user experience and perception of quality features such as control of dynamics, loudness, richness of tone, naturalness, engagement and general preference.

**Fig. 13.6** Spectral flattening: average equalization curve. Figure reprinted from [10]

**Fig. 13.7** The HSoundplane

## *13.3.3 The HSoundplane*

The HSoundplane, shown in Fig. 13.7, is a multi-touch musical interface prototype offering multi-point, localized vibrotactile feedback. The main purpose of the interface is to provide an open and versatile framework allowing experimentation with different audio-tactile mappings, for testing the effectiveness of vibrotactile feedback in musical practice.

#### **13.3.3.1 Hardware Implementation**

Most current touchscreen technology still lacks finger pressure sensing12 and often do not offer satisfying response times for use in real-time musical performance. To overcome these issues, our prototype was developed based on the Madrona Labs Soundplane: an advanced musical controller, first described in [19] and now commercially available.<sup>13</sup> The interface allows easy disassembly and is potentially open to hacking, which was required for our purpose. The Soundplane has a large multitouch and pressure-sensitive surface based on ultra-fast patented capacitive sensing technology, offering tracking times in the order of a few ms, as opposed to the lag ≥50 ms of the current best touchscreen technology [8]. Its sensing layer uses several carrier antennas, each transporting an audio-rate signal at a different fixed frequency. Separated by a dielectric layer, transversal pickup antennas catch these signals, which are modulated by changes of thickness in the dielectric layer due to finger pressure on

<sup>12</sup>With the exception of the recent Force Touch technology by Apple.

<sup>13</sup>www.madronalabs.com (last accessed on Nov 29, 2017).

the Soundplane's flexible surface. An internal DSP takes care of generating the carrier signals and decoding the touch-modulated signals for multiple fingers. The computed touch data (describing multi-finger positions and pressing forces) are sent to a host computer via USB connection. The Soundplane's sensing technology requires the top surface and underlying layers to be as flat and uniform as possible. A software calibration routine is provided to compensate for minor irregularities.

In the following of this section, we describe how the original Soundplane was augmented with vibrotactile feedback, resulting in the HSoundplane prototype (where 'H' stands for 'haptic').

#### Construction

The original Soundplane's multilayered design consists of a top tiled surface—a sandwich construction made of wood veneer stuck to a thin Plexiglas plate and a natural rubber foil—resting on top of the capacitive sensing layer described above. Since these components are simply laid upon each other and kept in place with pegs built into the wooden casing, it is quite simple to disassemble the structure and replace some of its elements.

To implement a haptic layer for the Soundplane, we chose a solution based on low-cost piezoelectric elements: In addition to the advantages pointed out in Sect. 13.2, such devices are extremely thin (down to a few tenths of a millimeter) and allow scaling up due to their size and cheap price. The proposed solution makes use of piezo actuator disks arranged in a 30 × 5 matrix configuration matching the tiled pads on the Soundplane surface, so that each actuator corresponds to a tile (see Fig. 13.8).

In order to maximize the vibration energy conveyed to the fingers, vibrotactile actuators should be ideally placed as close as possible to the touch location. The actuators layer was therefore placed between the top surface and the sensing components. However, such a solution poses some serious challenges: The original flexibility, flatness, and thickness of the layers above the sensing components have to be preserved as much as possible, so as to retain the sensitivity and calibration uniformity of the Soundplane's sensor surface. To this end, the piezo elements were wired via an ad hoc designed flexible PCB foil with SMD soldering techniques and electrically conductive adhesive transfer tapes (3M 9703). The PCB with attached piezo elements was laid on top of an additional thin rubber sheet, with holes corresponding to each piezo element: This ensures enough free space to allow optimal mechanical deflection of the actuators, and also improves the overall flexibility of the construction. However thin, the addition of the actuators layer alters the overall thickness of the hardware. For this reason, we had to redesign the original top surface replacing it with a thinner version. As a result, the thickness of the new top surface plus the actuators layer matches that of the original surface. Figure 13.9 shows an exploded view of the HSoundplane construction, consisting of a total of nine layers.

**Fig. 13.8** Schematic of the actuators' control electronics: **a** piezo actuators on flexible PCBs (simplified view); **b** slave PCBs with audio-to-haptic drivers and routing electronics; **c** master controller. Notice: The 1st and 32nd channels are unused

#### Electronics

Based on off-the-shelf components, custom amplifying and routing electronics were designed to drive piezo elements with standard audio signals.

In order to provide effective vibrotactile feedback at the HSoundplane's surface, some key considerations were made. Driving piezo actuators require voltage values (in our case up to 200 Vpp) that are not compatible with standard audio equipment. This, together with the large number of actuators used in the HSoundplane (150), poses a non-trivial electrical challenge. Being in the analog domain, the use of a separate audio signal for each actuator would be overkill. Therefore, we considered using a maximum of one channel per column of pads, reducing the requirements to 30 separate audio channels. These are provided by a MADI system14 formed by a RME MADIface USB15 hooked to a D.O.TEC ANDIAMO 2<sup>16</sup> AD/DA converter. To comply with the electrical specifications of the piezo transducers, the analog audio signals produced by the MADI system—whose output sensitivity was set to 9 dBu

<sup>14</sup>Multichannel Audio Digital Interface: https://www.en.wikipedia.org/wiki/MADI (last accessed on Nov 29, 2017).

<sup>15</sup>https://www.rme-audio.de/en/products/madiface\_usb.php (last accessed on Dec. 21, 2017).

<sup>16</sup>http://www.directout.eu/en/products/andiamo-2/ (last accessed on Dec. 21, 2017).

@ 0 dBFS (reference 0*.*775 V),<sup>17</sup> resulting in a maximum voltage of 2*.*18 V—must be amplified by about a factor 50 using a balanced signal. Routing continuous analog signals is also a delicate issue, since the end user must not notice any disturbance or delay in the feedback.

To address all the issues pointed out above, a solution was designed based on three key integrated circuits components: (1) Texas Instruments DRV266718 piezo drivers that can amplify standard audio signals up to 200 Vpp; (2) serial-to-parallel shift registers with output latches of the 74HC595 family19; (3) high-voltage MOSFET relays. For the sake of simplicity, the whole output stage of the HSoundplane was divided into four identical sections, represented in Fig. 13.8, each consisting of (a) a flexible PCB with 40 piezo actuators, connected by a flat cable to (b) a driver PCB

<sup>17</sup>For further details, see https://www.en.wikipedia.org/wiki/Line\_level (last accessed on Nov 29, 2017).

<sup>18</sup>http://www.ti.com/product/drv2667 (last accessed on Dec. 21, 2017).

<sup>19</sup>http://www.st.com/content/st\_com/en/products/automotive-logic-ics/flipflop-registers/ m74hc595.html (last accessed on Dec. 21, 2017).

**Fig. 13.10** Schematic of a slave driver board: **a** 8-channel audio input; **b** 8 piezo drivers; **c** 40-point matrix of relays individually connected to each piezo actuator; **d** relay control; **e** microcontroller for initialization and synchronization. Figure reprinted from [34]

with eight audio-to-haptic amplifiers and routing electronics. In order to address the wanted actuators and synchronize their switching with audio signals, (c) a master controller parses the control data generated at the host computer and routes them to the appropriate slave drivers.

Figure 13.10 shows the detail of a slave driver board, which operates as follows: (a) Eight audio signals are routed to (b) the piezo drivers, where they are amplified to high voltage and sent to (c) a 8 × 5 relay matrix that connects to each of the piezo actuators in the section. This 40-point matrix is addressed by (d) a chain of serial-toparallel shift registers commanded by (e) a microcontroller. On start-up, the microcontroller initializes the piezo drivers, setting among other things their amplification level. When in running mode, the slave microcontrollers receive routing information from the master, set a corresponding 40-bit word—each bit corresponding to one actuator—and send it to the shift registers, which individually open or close the relays of the matrix. As shown in Fig. 13.10, each amplified audio signal feeds five points in the relay matrix; therefore, each signal path is hard-coded to five addresses. Such fixed addressing is the main limitation of the current HSoundplane prototype: Each column of five actuators can only be fed with a single vibrotactile signal.

#### **13.3.3.2 Software Implementation**

The original Soundplane comes with a client application for Mac OS, which receives multi-touch data sensed by the interface and transmits them as OSC messages according to an original format named 't3d' (for touch-3d). The t3d data represent touch information for each contacting finger, reporting absolute *x* and *y* coordinates, and normal force along the *z*-axis.

In the HSoundplane prototype, these data are used in real time to generate audio and vibration signals and route the latter to the piezo actuators located at the corresponding *x*- and *y*-coordinates.

#### Relay Matrix Control

Synchronization between vibration signals and the four relay matrices happens at the host computer level. While vibrotactile signals are output by the MADI system, control messages are sent to the master controller via USB. The master controller parses the received messages and consequently addresses the slave driver boards on a serial bus, setting the state of the relay matrices.

The choice of using a master controller, rather than addressing each driver board directly, is motivated by the following observations: First, properly interfacing several external controllers with a host computer can be complex; second, the midterm perspective of developing the HSoundplane into a self-contained musical interface would eventually require to get rid of a controlling computer and work in closed loop. For that purpose, a main processing unit would be needed, which receives touch data, processes them, and generates vibrotactile information.

Rendering of Vibrotactile Feedback

Digital musical interfaces generally enable manifold mapping possibilities between the users' gesture and audio output. In addition to what offered by common musical interfaces, the HSoundplane provides vibrotactile feedback to the user, and this requires to define a further mapping strategy. Since the actuators layer is part of the interface itself, we decided to provide the users with a selection of predefined vibrotactile feedback mapping strategies. Sound mapping is freely definable as in the original Soundplane. Three alternative mapping and vibration generation strategies are implemented in the current prototype:


3. A simpler mapping makes use of a fixed frequency sine wave at 250 Hz for all actuators. This solution maximizes perceptual effectiveness by using a stimuli resulting in peak tactile sensitivity [39]. On the other hand, the produced vibrotactile cues being independent from sound output, they may result in occasional perceptual mismatch between touch and audition. At the present time, this has still to be investigated.

In a midterm perspective, the last two mapping strategies could be implemented as a completely self-contained system by relying on the waveform memory provided by the chosen piezo drivers model.

Several other strategies for producing vibrotactile signals starting from the related audio are possible, some of which are described in Sect. 7.3.

#### **13.3.3.3 Characterization**

Vibration measurements were performed with the same setup described in Sect. 13.3.1.4. Initially, four types of piezo actuators with different specifications were selected, each with a different frequency of resonance and capacitance. Since each piezo driver has to feed five actuators in parallel, particular attention was paid to current consumption and heat dissipation. A piezo actuator Murata Electronics 7BB-20-620 was eventually selected, for it had the smallest capacitance value among the considered actuators, and therefore lower current needs.

Once the piezo layer was finalized, vibrotactile cross talk was informally evaluated. Thanks to the holed rubber layer, which lets actuators vibrate while keeping them apart from each other, the HSoundplane is able to render localized vibrotactile feedback with unperceivable vibration spill at other locations, even when touching right next to the target feedback point.

Vibration frequency response was measured in the vibrotactile range as follows: The accelerometer was stuck with double-sided tape at several pads of the top surface, and the underlying piezo transducers were fed with a sinusoidal sweep [9] between 20 and 1000 Hz, at different amplitudes. Making use of the sensitivity specifications of the I/O chain, values of acceleration in m*/*s<sup>2</sup> and dB (re 10−<sup>6</sup> m*/*s2) were obtained from the digital amplitude values in dBFS. Figure 13.11 shows the results of measurements performed in correspondence of four exemplary piezo transducers, for the maximum vibration level achievable without apparent distortion. Such signals are well above the vibrotactile thresholds reported in Sect. 4.2 for active touch, effectively resulting in intense tactile sensation. In general, the frequency responses measured at different locations over the surface are very similar in shape, with a pronounced peak at about 40 Hz. In some cases, they show minor amplitude offsets (see, e.g., the response of piezo 102 in Fig. 13.11) that can be easily compensated for.

Further measurements are planned in the time domain to test synchronization between audio signals and relay control, and to quantify closed-loop latency from

<sup>20</sup>https://www.murata.com/products/productdetail?partno=7BB-20-6 (last accessed on Dec. 21, 2017).

touch events to the onset of vibrotactile feedback. Also, similar to what was done for the Touch-Box (see Sect. 13.3.1.2), we plan to characterize finger pressing force as measured by the HSoundplane.

## **13.4 Conclusions**

A few exemplary interfaces providing vibrotactile feedback were described, which have been recently developed by the authors for the purpose of conducting various perceptual experiments, and for musical applications. Details were given on the design process and on the technological solutions adopted for rendering accurate vibratory behavior. Measurements were performed to characterize the interfaces' input (e.g., finger pressing force, or keyboard velocity) and output (vibratory cues).

It is suggested that the characterization and validation of self-developed haptic devices is especially important when employing them in psychophysical experiments, as well as in evaluation and performance assessments (see the studies reported in Chap. 4, Sect. 5.3.2.2, and Chap. 7). One the one hand, as opposed to relying on assumptions based on components' specifications, characterization offers objective, verified data to designers and experimenters, respectively, enabling them to refine the developed devices and to better interpret experimental results. For instance, characterization data describing the actual nature of rendered haptic feedback may offer a better understanding of its perceived qualities. On the other hand, the characterization of haptic prototypes—together with their technical documentation—allows reproducible implementations and enables other users and designers to carry on research and development, rather than resulting in one-of-a-kind devices.

**Acknowledgements** The authors wish to thank Randy Jones, the inventor of the original Soundplane, for providing technical support during the development the HSoundplane prototype, and Andrea Ghirotto and Lorenzo Malavolta for their help in the preparation of the piano vibration samples. This research was pursued as part of project AHMI (Audio-Haptic modalities in Musical Interfaces, 2014–2016), funded by the Swiss National Science Foundation.

## **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Glossary and Abbreviations**


**Exciter** See actuator.

**Filter** A generic tool for data or signal processing. For example, in the case of signal processing, low-pass or high-pass filters shape the frequency spectrum of a signal by respectively attenuating frequencies above or below their cutoff frequency, while band-pass filters attenuate frequencies below and above a certain range. With regard to data processing, MIDI filters are used to modify a MIDI data stream, e.g. by letting only certain messages pass through.

**Force Feedback** Same as reactive force. See kinaesthetic feedback.


## **Tactor** See actuator.

**Vibrotactile** Relative to the perception of vibration through touch (vibrotaction). **Virtual Musical Instrument** A software simulation of a musical instrument (either existing or not) that generates sound in response to data input (e.g. MIDI or OSC). When coupled with a digital musical interface, a complete DMI is created.